I’m still here – 2

Tags

I bet you thought I’d gone away, given up on this project. Well once before I had to remind you I’m still here so it’s time for another status update.

My life circumstances have strongly interfered with both developing my application and its associated vocabulary, but that doesn’t mean I’ve forgotten about this. Actually the pause in my activity provides an opportunity to test my picking algorithm after a long pause, what words will show up (those not recently drilled or those with worst scores).

So after a two month pause I’ve returned to slowly adding more quiz definitions (to existing defined words) which is a tedious process, but I’m getting a few more done; so much so, in fact, that now the undrilled words are increasing in number.

Meanwhile without many drills my previous quiz results are aging (the average time since last drill (per word, then averaged) is about 75 days, so I have a lot of catching up to do.

So I just wanted to say I haven’t given up on this project and maybe I’ll soon return to more posting, progress in coding, progress in developing the vocabulary, and more reports of interesting words.

So stay tuned.

Advertisements

21,000 Drills!

Tags

I just consolidated all my drill (quiz) results into a single file (with all terms from latest file) and found I’ve done 20.993 drills – amazing. It takes about 8.3 seconds per drill so that’s 48.4 hours of drills over 260 calendar days (172 with drills). While I’ve continued to add words (and more importantly the quiz definitions for these words) my drills are for 1587 words (latest count) out of 1613 possible. So here’s my rate of doing drills (30 day moving average):

vocaDrill-3

The peak rate was achieved around 3/1/2015 where I was intensely concentrating on drills but as you can see that rate dropped a lot (mostly due to working on synonym project) and is now picking up again.

It’s interesting that after the drop in drill rate I’m now getting more errors and/or not knowing the words which does show the effect of short-term learning (i.e. while drilling intensively I remember words, briefly, and then forget them again). This large amount of data will now give me more chance to test my biased picked algorithm.

Here’s the distribution of average score (per word):

vocaDrill-3A

And a bit more detail for the lower scores:

vocaDrill-3B

I think these scores are unrealistically high (due to the short-term learning effect) but it’s good to see (in the first graph) that now relatively few words have a 100% score so I’m making mistakes and/or forgetting and thus getting a better distribution of how well I know the words in my vocabulary.

And here’s the distribution of #drills/word, which is distorted due to my focus (for a brief period) on just the words where I’d previously gotten poor scores (which also improved during that intensive drill on that subset (around 600 if I remember correctly). I’d expect this now to smooth out a little bit as I do more drills in this consolidated file;

vocaDrill-3C

I’ll continue to add quizdefs (still missing about 500) and do more drills in that in-process file (to get some drills of newer words) before consolidating again. The size of all my drill history data is getting rather large (25,617,979 bytes) so I need to figure out how I’m going to consolidate all that data (per word, rather than just in history) so future drills can use my entire history despite the inconsistencies in how all that history was accumulated (i.e. different drill methods, different subsets of words).

 

zeitgeist vs ethos

Tags

,

I just missed this in a quiz so I thought I’d use this pair as today’s WOTD.

This is a good example of how I need to use some more intelligence in my quiz generation. I provided two choices (one for zeitgeist, the one I picked and wrong, and another for ethos) and so it was very difficult to distinguish which definition best fits ethos.

So let’s start with dictionary definitions, zeitgeist first:

n.(Oxford)
the defining spirit or mood of a particular period of history as shown by the ideas and beliefs of the time

n.(dictionary.reference)
the spirit of the time; general trend of thought or feeling characteristic of a particular period of time

the prevailing viewpoints, attitudes, and beliefs of a given generation or period in history

Fairly straightforward with all three definitions reasonably consistent, so let’s look at ethos:

n. (Oxford)
the characteristic spirit of a culture, era, or community as manifested in its beliefs and aspirations

n.(dictionary.reference)
1. [sociology] the fundamental character or spirit of a culture; the underlying sentiment that informs the beliefs, customs, or practices of a group or society; dominant assumptions of a people or period
2. the character or disposition of a community, group, person, etc.
3. the moral element in dramatic literature that determines a character’s action rather than his or her thought or emotion

the core principles or beliefs of a religion, culture, or community

Now with the exception of using ‘era’ in the Oxford definition (or ‘period’ in dictionary.reference definition) the sense of ‘particular period of time/history’ in zeitgeist’s definition is muted in the definition for ethos.  So these are relatively subtle distinctions. Here’s the quiz I missed:

ethos/ethos deduce(wrong) -0.9 on 5/16/2015 12:24:36 PM
[4] the fundamental character or spirit of a culture; the underlying sentiment that informs the beliefs, customs, or practices of a group or society
[1] zeitgeist | the spirit of the time; general trend of thought or feeling characteristic of a particular period of time

The green answer is correct and the red answer is the one I picked. In this case there is clear distinction between ‘spirit of the time’ and ‘fundamental character or spirit’ so actually this was a good quiz (to teach me) even if it is purely random (in the quiz generation algorithm) that these two definitions were my choices. So, IOW, this really was a mistake on my part and I deserve the negative score I got (which will bias the frequency, in the future, that I will see ethos as a quizword).

So that takes care of this, the meanings of these two very similar words, and stands as an excellent example of whether I could create code, in the quiz generator, to actually deliberately trigger this quiz, esp. given I got it wrong before so in the future I should be forced to get it right.

Note to self: On wrong answers analyzing not just the score, but also exactly what was wrong, i.e. boneheaded stupid answer or ignorant answer or difficult to distinguish answer (thus suitable for more drill) would be an interesting challenge. Fortunately I took the approach of recording everything about quizzes in my <history> section of the XML file, but as this portion of the file is now huge and I’ve been contemplating how to summarize it (per word entry) I need to consider ambiguity of which definitions cause wrong answers (per word or per all choices in the quiz).

Any reader is welcome to chime in with any ideas how to use all this recorded information to create better quizzes.

btw: Using my code for synonyms is interesting for this pair of words. For ethos, here are the closest matches based on overlap of synonyms:

animus 0.4250
noumenon 0.2424
mores 0.2250
wraith 0.1696
subliminal 0.1492
vivaciousness 0.1492
inculcation 0.1397
accidence 0.1202
humanism 0.1161
zeitgeist 0.1094

so zeitgeist is a very low scored match and so wouldn’t (by this algorithm) considered very similar to ethos. Looking at zeitgeist the best matches are:

milieu 0.5000
penchant 0.3750
propensity 0.3700
predisposition 0.3611
proclivity 0.3125
vivaciousness 0.2984
predilection 0.2775
declivity 0.2500
noumenon 0.2424
purport 0.2263
animus 0.2125
ardency 0.2081
wraith 0.1696
affective 0.1696
nimbus 0.1563
numen 0.1544
zealousness 0.1455
volubility 0.1440
plaintiveness 0.1375
gestalt 0.1375
pathos 0.1375
poignance 0.1375
impetuosity 0.1320
poignancy 0.1307
inflection 0.1161
inflections 0.1161
ethos 0.1094

Now the much closer match of milieu makes sense but it’s interesting how low the match with ethos is.

IOW, synonym matching would be useless, for this pair, of determining these words are very similar.

Interesting challenge then – what algorithm would group these words together as similar?

Vocabulary Measurements

Tags

It’s been about 10 months since I first began to accumulate my vocabulary so now is a good time to report some results. As a reminder what I’m doing here is accumulating a fairly large list of words that I know a bit or not at all so I use custom application I’m writing as a learning tool. But in order to accomplish this I have to: a) actually write the application, a major work-in-progress (with little work in last few months), and, b) manually accumulate the data, which breaks down to: a) the words themselves (mostly from encountering the word while reading in Kindle and recording the word in the Vocabulary App of the Kindle, but also from some other sources, b) entering the words and their primary definition (from Oxford Online Dictionary) and later secondary definition (from dictionary.reference.com) and for some of the words from a good book of 1200 interesting books, c) from those definitions extract and code up definitions to use in quizzes, and, d) to “rate” the words (how important to learn, how well I think I already know). Additional steps, either somewhat started or still to do, are then to: a) add synonyms (and attempt to use to measure “similarity” between words, and, b) proofread and review all the inputs. Plus as I do the work I notice typos and/or sometimes do spellchecks (tedious) with MSWord to spot and fix errors.

So given that process where do I stand?

vocabStatus-3a

The graph above shows my cumulative progress. The blue markers and line is the total number number of words I have defined (primary definition) in my XML vocabulary file. And the red markers and line is the total number of words that have quiz definitions and are eligible for drilling in quizzes. The green markers and line is my most recent number of drills (I have a lot more, about 15000, in some archive files).

And recently here’s a different look at my progress:

vocabStatus-3b

The blue data is total number of words and red data is number of words with quiz definitions (as on previous graph, but for more recent time). The green markers and line is the backlog of words that need quizdefs. Adding quizdefs is a process that is harder (and that I like less) than the raw input so I tend to delay on this. That the backlog has risen, of late, is a consequence of the separate sub-project I’m doing on synonyms which has provided a lot of new words (plus the ones I routinely find reading my Kindle) where it is easier (plus exigent) to add words but then delay on getting around to generate the quizdefs. I’m mostly holding off doing any rating until I get backlog of word completely vocabulary itself.

And I have about another 200-300 words in a separate file (extracted from various sources) I need to add and I’m slowly working down that backlog.

So that gives me a couple of overall stats on how much time this data gathering process is taking (and this is only the work I can measure in the app, there is probably 2X this much that gets done but isn’t explicitly measured):

vocabStatus-3c

So I have about 110 hours invested in just input activity (with an estimated backlog of another 23 hours to go). With other additional words to input and process I’d estimate I’ll need about 150 hours total (which seems, intuitively, to be much less than I feel I’ve actually spent).

So here’s a few statistics about the vocabulary file itself:

total #nodes=26527 total #attributes=85439 total xmlsize=5,071,656
terms: #subnodes=2028 xmlsize=2,330,086 txtsize=976,525 filesize=5,565,612
#words=2028 #subnodes=6757 ave#=3.33 #refs=875
derivsz=72,214 avesz=35.61 #derivs=6611 ave#=3.26
defsz=588,073 avesz=289.98 fraction=0.106 qdefsz=310,937
#rated=980 #drills=21 fraction=0.483
#qdefwords=(1538,882) #qdefs=5644 ave=3.670 fraction=0.758

This won’t mean much to you, Dear Reader, but I’m putting this here for my own historical reference.

Now meanwhile I have accumulated (in a separate process, still to merge into main XML) about 2000 words with their synonyms. This is another ongoing project where I need to complete work and some research, then some code, before merging this data into the main file.

And I’ve done about 15,000 drills where I have detailed data that I need to reduce to a simpler form to actually have a historical “score” to determine how well I’m doing on any particular word. I’ve shown various status reports on this before so I won’t go into that data here.

And I have a lot to do, both the data collection and review and the coding, so even if I really concentrated on this project I doubt I’d complete it (to version 1.0 level) by the end of a calendar year of working on it! Whew! Never thought it would get this involved so this is a good time-killer.

And someday I also need to rework all this to do a vocabulary (and changes to app) for drill in other languages. We’re considered a fall trip to Quebec, so soon I’ll need to look into food and some common terms in French (my second, but rarely used language).

So lots to do, so keep on keepin’ on (as they say in thru-hiking long trails).

fungible and frangible

Tags

These words are fairly close in spelling but very different in meaning so it should be possible to keep them straight. If you’re an economist or financial expert you’ll find many more opportunities to use fungible but a gardener will find more opportunities to use frangible.

So let’s start with the gardeners word, frangible:

adj.
{formal} fragile; brittle

easily broken; breakable

easily breakable

While the strict meaning could apply as easily to glassware this word is most often used about soil, i.e. whether the soil sticks together in horrible clumps (like California clay) or is nice and loose like well-prepared soil.

fungible, is more narrowly defined:

adj.
[law] (of goods contracted for without an individual specimen being specified) able to replace or be replaced by another identical item; mutually interchangeable

(especially of goods) being of such nature or kind as to be freely exchangeable or replaceable, in whole or in part, for another of like nature or kind

freely exchangeable for another of like nature; interchangeable

So fungible most often is tied to the concept of some object that can be used as money; actual money (at least currencies with free exchange systems like the dollar or euro) are very fungible. So libertarian nuts believe this applies to gold, but gold is not money and try to buy lunch with it (or for the most part bitcoins as well), but it is certainly true that one bitcoin should be exchangeable for another and if not a scam sometimes for actual useful money.

pernicious and pertinacious

Tags

Here are a couple of words that I’ve missed in quizzes even though I have a good idea about the meaning of pernicious (not so clear on pertinacious) so let’s do these two as Word Of The Day since I haven’t done any of these posts for a while (lost in synonyms project, still not ready to publish any conclusions).

So, what about pernicious:

adj.
having a harmful effect, esp. in a gradual or subtle way

1. causing insidious harm or ruin; ruinous; injurious; hurtful
2. deadly; fatal
3. {obsolete} evil; wicked

resulting in damage or harm; having a debilitating effect

One reason I can (usually) remember this is combining this in the phrase, pernicious anemia, where pernicious is the qualifier to indicate the anemia really is a problem (rather than merely an unusual state, although not quite to the level, in this phrase, of implying the deadly or fatal meaning (might be, but more on the harmful side). It’s interesting to see the {obsolete} meaning as I think this may often also be implied in various contexts where pernicious is used.

So let’s move on to pertinacious:

adj.
{formal} holding firmly to an opinion or a course of action

1. holding tenaciously to a purpose, course of action, or opinion; resolute
2. stubborn or obstinate
3. extremely or objectionably persistent

I think ‘pert’ confuses me when I think of this word and there is no connection at all and I should think ‘persistent’ instead. This word applies to lots of people I know and lots of issues that involve me so I really should put this word on the tip of my tongue – how about you? Can’t you use this a lot. And it might be possible to easily combine these:

The ___ (fill in the blank) party is so pertinacious about taxes that their policies have a really pernicious effect on the economy.

btw: Let’s also include a bit of what my synonyms project shows. For pernicious the words whose synonyms most closely match the synonyms of pernicious are:

miasmatic(1.7583), maleficent(1.7515),
miasmic(1.5979), pestilent(1.1583),
noxious(1.1333), baneful(1.1333),
malefic(1.1151), pestilential(1.0619),
baleful(1.0208), deleterious(0.9802),
nocuous(0.9500), malign(0.8690),
noisome(0.8641), prejudicious(0.8521),
pestiferous(0.8295), insalubrious(0.7780),
nocent(0.7455), malevolent(0.7102)

and the top matches in the synonym tree (expanded to 4 degrees of separation) are:

pernicious:  #found=873,  Top 10:  (malign 3.762 1) (noxious 3.556 1) (pernicious 3.336 2) (censure 2.909 3) (deleterious 2.809 1) (pestilential 2.697 1) (malevolent 2.447 1) (vilify 2.409 2) (pestiferous 2.305 1) (baleful 2.212 1)

And for pertinacious the closely matching words are:

obstinate (0.9731), mulish (0.9553),
intransigent (0.8690), refractory (0.6686),
intractable (0.6517), obdurate (0.6167),
perverse (0.5348), contumacious (0.4964),
splenetic (0.4808), inexorable (0.4543),
importunate (0.4458), indefatigable (0.4137),
adamantine (0.3917), restive (0.2938),
dissentient (0.2839), inveterate (0.2682),
peevish (0.2170), stoical (0.2134)

Note that the scores are much lower here so pernicious appears to have words that are more “similar” as measured by synonyms overlap. And the four degrees of separation tree for pertinacious are:

pertinaciousness:  #found=4,  Top 10:  (obstinacy 1.335 1) (perverseness 1.168 1) (mulishness 1.168 1) (pertinaciousness 0.838 2)

so the synonym tree for pertinancious is much more sparse (which could imply either fewer synonyms or more of its synonyms are just common words and not in my vocabulary, so let’s check that:

Here are the unduplicated synonyms (all words):

pernicious: bad hurtful damaging dangerous deadly destructive detrimental devastating harmful lethal malicious nefarious noxious poisonous ruinous toxic virulent baleful deleterious evil fatal iniquitous injurious killing maleficent malevolent malign malignant miasmatic miasmic mortal noisome offensive pestiferous pestilent pestilential prejudicial sinister venomous wicked

pertinacious: resolute attentive determined dogged persistent tenacious obstinate insistent perverse stubborn bullheaded firm headstrong inflexible unshakable

So it appears the main difference, in synonym analysis techniques, is merely the much larger list of synonyms for pernicious than pertinacious. Now could this just be an artifact of this source?

So a different source has these synonyms:

perncious: adverse, bad, baleful, corrupting, damaging, dangerous, deleterious, destructive, detrimental, evil, harmful, hurtful, inimical, injurious, maleficent, malevolent, malign, malignant, noxious, poisonous, unfavorable, unhealthy, wicked

pertinacious: assiduous, contrary, determined, dogged, headstrong, implacable, importunate, indefatigable, inflexible, insistent, intractable, intransigent, mulish, obdurate, obstinate, persevering, persistent, perverse, purposeful, refractory, relentless, resolute, stubborn, tenacious, tireless, unbending, uncompromising, unrelenting, unshakeable, unyielding, wilful

So the second source has far more synonyms for pertinacious than pernicious. It’s a pain to attempt to actually compare the lists so I won’t but this does imply that source may have a lot of influence on these algorithms. OH JOY, now this means more data collection, i.e. getting alternative lists of synonyms from another source and more analysis – will this synonym project never end!

My synonym project is out of control

Tags

As I’ve previously reported here I started of “proof of concept” project to attempt to use synonyms to measure “similarity” between words (in my drill vocabulary). My first idea, to create synonym trees (by degrees of separation) was pretty much a bust – didn’t seem to help that much, partly because that ignores the “common” words I don’t have in my vocabulary and thus the trees are very skewed.

So I started a different approach. I compare word X to word Y by comparing the X’s synonyms to Y’s synonyms – the greater the match (with some tuned parameters for creating the “match” metric) the more similar the words. This works better but is still not clear how useful it is.

But the real problem is that I digressed on getting too much data. If I just use a bit of data (synonyms for small list of words) I may get misleading results, so I set out to get much more data. But a lot of data then creates so much to look at it’s hard to really understand the results. But worst, in my sorta OCDish way I got so preoccupied with accumulating data I’ve done little else for the past two weeks and spent much more time getting data (manually extracting synonym lists from my source, which I’m finding has many deficiencies, but that’s a different post).

My data file (manually accumulating with MSWord) has 15,658 unique synonyms that I don’t have defined (haven’t extracted their synonyms) and and 2,332 words where I do have synonyms. Altogether my MSWord file is now 88,574 words (many of these are duplicates and/or parts-of-speech metadata). I have 43 words (now placeholders) still to do (and these are the “ugly” ones, massively duplicated synonyms, a pain to extract) and 157 words from my vocabulary (of about 1800) that have no synonyms in the source I’m using. That’s a lot of typing just to get a test data set.

But extracting data manually has its advantages (say, in comparison to just finding a machine-readable list someone else created and posted on Net). At least I know, fairly well, what the source of my data is and some of its limitations. (This project reminded me of where I learned this lesson long ago in college. I was helping a grad student with his thesis research and in a tiny unventilated room, where I was using a solvent similar to “glue” (i.e. what kids sniffed) I was doing mind-numbing work of accumulating data on samples. At one point, undoubtedly high from the fumes, I mixed up my samples, but continued to extract data which I turned over to the grad student. A few days later he was conducting a symposium where he went on and on about a bump in the data on a graph extracted from my raw data. I’m in the back of the room groaning, knowing the “bump” might be where I mixed up samples, but as he didn’t do the work himself he didn’t know what the raw data was. Fortunately it turns out there was a good reason for the bump (a phase transition in the metal) and no just my data error. But I learned the lesson to ALWAYS do some of the data gathering yourself so you really know about the data (and its limitations and problems) and not just depend on someone else)

But now that I have a lot of data my analysis of it (given my algorithms have numerous tunable parameters that need to be set properly) it’s hard to understand (form a mental model) of what the analysis really means.

For instance, the source, among its numerous issues, has a tendency to have highly redundant sublists of synonyms (really it’s doing something similar to my synonym tree approach). I first encountered this as a simple bug in the code where I assumed (wrongly, of course) that synonyms for a particular word would be unique. So after fixing the bug that’s now something I measure. For instance, the word ‘bravura’ has 569 synonyms!, but really only 98 unduplicated ones. So the ratio of unique synonyms to raw synonyms (directly extracted from the source) is 0.172. In fact, 471 words have some amount of duplication. In fact, here’s a graph of the 81 words that have a ratio (unique/raw) of < 0.5:

synonym-4

The horizontal axis is #raw synonyms (directly from the source) and the vertical axis is the number of unduplicated synonyms; for this set there is about 3:1 duplication. Now overall the overall results aren’t that bad:

synonym4A

As you can see, about 4/5ths of the words have unduplicated list of synonyms.

Now actually I wanted to show more analysis in this post but I’ve run out of time so I’ll have to return to the rest of this later.

Synonyms and similarity for two previous words – deprecate and depreciate

Tags

,

I continue to do the tedious work of accumulating synonyms (from a single source, with some problems I’m finding) and now have 1772 synonym lists (76 are the same root) and 515 (mostly from vocabulary) still to extract. I’m only getting synonyms for words that match my vocabulary list but sometimes the synonym source has separate lists (and sometimes same) for inflections and derivatives so that’s why my synonym list is now almost as large as my vocabulary list. I’m glad I’ve done this much as it gives me much better guidance how I will add all this work (just done in an MSWord file with certain coding conventions) to my master XML vocabulary file.

How I’ll use the synonyms remains to be seen, but here’s a short analysis of just these two words.

I’m trying two difference approaches to using synonyms to define “similarity” between words (to then use that measure to build quizzes, avoiding bad ambiguities and deliberately creating other ambiguity (to make quiz more difficult)). My first approach is a synonym tree – i.e. for a word, get all its synonyms (just those in my vocabulary, i.e. ignoring more common words) and then find the synonyms for those words and iterate. This creates a “tree” with degrees of separation from the original word where I score all the synonyms I find by decreasing weight (degree^n) but then higher when finding the same word on different branches of the tree. I’ve already found only a few degrees of separation leads to silly (useless) results for about three degrees of separation is best. But ignoring the common words may (and does) bias my results.

My other approach is simply to compare a word with all other words scoring the synonyms for how much they overlap which then includes the common words. So now I have enough data I can use these two words as a simple test.

Here’s the synonym tree for depreciate:

level [1] 15 terms

decry(36,2) asperse(10,0) calumniate(13,0) censure(47,20) contemn(4,1) defame(25,4) denigrate(20,0) denounce(47,4) deprecate(14,1) derogate(7,0) discountenance(21,3) disparage(39,0) malign(29,11) traduce(10,0) vilify(36,2)

 level [2] 48 terms

reprehend(11,0) reprobate(15,2) rebuke(51,6) admonishment(8,0) admonition(13,0) castigation(7,1) obloquy(15,6) remonstrance(7,0) reprehension(42,7) reproach(46,2) reproof(11,0) stricture(7,0) admonish(21,2) berate(16,0) castigate(33,3) animadvert(117,11) cavil(3,0) impugn(27,2) ostracize(19,0) remonstrate(28,0) reprove(13,0) upbraid(14,0) disdain(40,6) besmirch(11,1) belie(29,2) stigmatize(10,0) vituperate(25,0) excoriate(32,0) adjudicate(8,0) declaim(19,2) proscribe(23,1) expostulate(5,1) disconcert(31,2) abash(3,0) discomfit(35,0) antipathetic(7,0) baleful(20,1) baneful(20,2) deleterious(16,3) inimical(18,1) malefic(17,0) maleficent(22,3) malevolent(25,1) noxious(28,4) pernicious(38,3) rancorous(11,1) assail(20,0) debase(45,8)

 level [3] 84 terms

wanton(72,11) incorrigible(16,2) berating(16,0) comeuppance(4,2) expostulation(11,3) objurgation(41,0) reproval(6,0) upbraiding(14,0) retribution(16,1) calumny(3,0) animadversion(10,0) aspersion(16,0) ignominy(10,1) invective(22,5) vituperation(15,0) attribution(5,2) diatribe(16,1) disapprobation(7,0) disparagement(27,2) imputation(5,0) inculpation(40,0) reprobation(39,0) odium(34,4) opprobrium(15,0) enjoin(37,1) exhort(25,1) lambaste(40,1) scarify(52,2) scourge(37,1) construe(15,0) elucidate(15,0) interpose(13,0) opine(15,0) fulminate(23,0) execrate(22,3) descry(14,0) espy(12,0) demur(27,1) inveigh(22,0) recriminate(32,3) contravene(35,2) gainsay(20,1) antipathy(25,4) contumely(11,0) haughtiness(15,1) hauteur(18,0) insolence(25,3) superciliousness(39,5) defile(33,4) confute(24,2) controvert(20,1) bloviate(44,4) perorate(6,0) interdict(12,0) dissuade(16,0) confound(33,0) nonplus(37,0) calamitous(23,1) pestilent(24,0) pestilential(20,1) nocent(11,0) nocuous(5,0) prejudicious(16,0) oppugnant(10,0) heinous(34,3) hideous(40,1) nefarious(35,3) waspish(15,4) fetid(28,2) insalubrious(18,0) noisome(36,0) pestiferous(12,0) iniquitous(5,0) miasmatic(13,3) miasmic(19,2) acrimonious(31,7) demean(18,0) abase(14,0) debauch(26,1) debilitate(23,4) deprave(13,0) adulterate(38,1) vitiate(15,3) bestialize(10,0)

Note that the total number of words, within three degrees of separation, is 258. But also, critically, deprecate is a first level synonym for depreciate so that entire branch of the tree, from the first level down, will be the same. And most of the synonyms, from these first three degrees of separation, do have some connection with depreciate itself but by the third degree the words are beginning to be fairly remote.

Now also note if I extend the tree to 15 degrees of separation I get 1208 or about two-thirds of my entire list. The top 10, with the scores based on my current “best” weights are:

(censure 5.147 1) (reproach 3.247 2) (vilify 3.127 1) (malign 3.084 1) (rebuke 3.075 2) (disparage 2.770 1) (defame 2.734 1) (denounce 2.720 1) (berate 2.424 2) (asperse 2.342 1)

where two of the synonyms (rebuke, berate) are found at the second degree of separation. The somewhat larger list of “top” synonyms is (score is second column, first found at degree is second column, and number found (all degrees) is third column:

censure 4.613 1 37
reproach 3.124 2 26
vilify 3.014 1 16
rebuke 2.935 2 23
malign 2.864 1 14
denounce 2.664 1 16
disparage 2.584 1 12
defame 2.564 1 11
berate 2.261 2 16
asperse 2.229 1 9
decry 2.157 1 10
castigate 2.149 2 16
upbraid 2.148 2 15
denigrate 2.117 1 8
traduce 2.062 1 8
pernicious 1.982 2 15
calumniate 1.950 1 7
deprecate 1.877 1 8
noxious 1.833 2 14
reprove 1.773 2 11
derogate 1.745 1 6
admonish 1.644 2 11
obloquy 1.608 2 12
discountenance 1.542 1 6
deleterious 1.495 2 10
contemn 1.467 1 5
reproof 1.457 2 9
vituperate 1.455 2 8
reprehend 1.417 2 7

censure is one of the words where I think the source is a little confusing; it simply includes way too many, even distantly related, “synonyms” and so I believe this impacts this method of analysis a bit.

Now looking at deprecate (which is missing the sense of financial meanings that depreciate has) here’s its smaller tree:

level [1] 5 terms

depreciate(63,11) derogate(7,0) discountenance(21,3) disparage(39,1) expostulate(5,2)

 level [2] 17 terms

decry(36,2) asperse(10,0) calumniate(13,0) censure(47,19) contemn(4,0) defame(25,4) denigrate(20,0) denounce(47,4) malign(29,11) traduce(10,0) vilify(36,2) disconcert(31,2) abash(3,0) discomfit(35,0) disdain(40,7) dissuade(16,1) remonstrate(28,3)

 level [3] 55 terms

reprehend(11,0) reprobate(15,2) rebuke(51,6) admonishment(8,0) admonition(13,0) castigation(7,1) obloquy(15,6) remonstrance(7,0) reprehension(42,6) reproach(46,2) reproof(11,0) stricture(7,0) admonish(21,1) berate(16,0) castigate(33,3) animadvert(117,8) cavil(3,0) impugn(27,2) ostracize(19,0) reprove(13,0) upbraid(14,0) besmirch(11,1) belie(29,2) stigmatize(10,0) vituperate(25,0) excoriate(32,0) adjudicate(8,0) declaim(19,2) proscribe(23,1) antipathetic(7,0) baleful(20,1) baneful(20,2) deleterious(16,3) inimical(18,1) malefic(17,0) maleficent(22,3) malevolent(25,1) noxious(28,4) pernicious(38,3) rancorous(11,1) assail(20,0) debase(45,8) confound(33,0) nonplus(37,0) antipathy(25,5) contumely(11,0) disparagement(27,2) haughtiness(15,1) hauteur(18,0) insolence(25,3) superciliousness(39,5) exhort(25,1) demur(27,1) inveigh(22,0) recriminate(32,3)

Note that depreciate is a synonym of deprecate at first degree of separation so that entire branch of this tree will be the same.  And given the close similarity of these words (depreciate is really a replacement for deprecate) it’s interesting that this source has so many more synonyms for depreciate (most of which could be synonyms of deprecate) and deprecate has only one unique synonym, expostulate, which really doesn’t fit very well within the meaning.

In short, for these words, the synonym tree is fairly useless. I did a lot of other analysis (fairly tedious to do by hand, will have to add to my code) but now, as I attempt to write up results, I realize most of what I found is just artifact of the poor synonym lists of this source (one of its many problems), so I’ll halt here and attempt to find a better pair at some point (especially after automating the analysis).

 

Another single letter: deprecate and depreciate

Tags

,

deprecate and depreciate have frequently confused me. I have a fairly good idea of the meanings of each, but often am now sure which one I really want to use. One factor inhibiting is a financial background where depreciate has such a specific meaning that I sometimes neglect its broader meanings and possibility of wider use. So let’s look at these:

deprecate, a verb with definitions from three difference sources:

 1) express disapproval of

2) another term for DEPRECIATE (sense 2: disparage or belittle (something))

to express severe disapproval of another’s action

1) to express earnest disapproval of

2) to urge reasons against; protest against (a scheme, purpose, etc.)

3) to depreciate; belittle

4) {archaic} to pray for deliverance from

Now right away we have the issue that my first source effectively says that deprecate is the same as sense 2 of depreciate, whose definitions we’ll show below:

 1) diminish in value over a period of time

<special usage> reduce the recorded value in a company’s books of (an asset) each year over a predetermined period

2) disparage of belittle (something)

 

1) to reduce the purchasing value of (money)

2) to lessen the value or price of

3) to claim depreciation on (a property) for tax purposes

4) to represent as of little value or merit; belittle

The first pair of definitions is from the same source as the definition of deprecate and thus the 2) definition (first one) is what was being mentioned in the second definition of deprecate.

So these sources are saying depreciate really is the same as deprecate but with the added meanings of the financial sort. I wonder how this happened – a word gains new meanings but also gains a letter.

A usage note in dictionary.reference.com provides the following:

An early and still the most current sense of deprecate is “to express disapproval of.” In a sense development still occasionally criticized by a few, deprecate has come to be synonymous with the similar but etymologically unrelated word depreciate in the sense

That’s interesting but not quite enough. What more can I find?

For depreciate two sources list these etymologies:

Late Middle English: from late Latin depreciat- ‘lowered in price, undervalued’, from the verb depreciare, from Latin de- ‘down’ + pretium ‘price’.

and

1640-50; < Late Latin dēpretiātus undervalued (past participle of dēpretiāre, in Medieval Latin spelling dēpreciāre), equivalent to Latin dē- de- + preti (um) price + -ātus -ate

IOW, these both cover the financial sense of the term, not the disaparage or express disapproval sense, so let’s see about deprecate, from the same two source:

Early 17th century (in the sense ‘pray against’): from Latin deprecat- ‘prayed against (as being evil)’, from the verb deprecari, from de- (expressing reversal) + precari ‘pray’.

and

1615-25; < Latin dēprecātus prayed against, warded off (past participle of dēprecārī), equivalent to dē- de- + prec (ārī) to pray + -ātus -ate

So it appears each word started with just one sense and these are quite different, but over time (unclear when/how) deprecate has subsumed some of the meaning of depreciate.  Sounds to me this is really the same confusion I’ve always had.

So, both through logic and observing the note (a sense development still occasionally criticized by a few) the few are right, keep them separate. When you’re talking about loss of value of something, use depreciate and when you wish to use the meaning of disparaging or disapproving use deprecate. That’s actually what I thought (sorta) but now I’ve convinced myself of this via a little research.

All, as a word-still-in-progress I’m trying to use my synonym datafile and code to see what it would indicate is the degree of similarity, but that will be another post.

anaphora antistrophe aposiopesis ellipsis oxymoron tropology: really the same?

Tags

,

I’ve been accumulating synonyms, from a particular source, for the words in my vocabulary to see if I can do various processing on the synonym lists to determine “similarity” between words, a metric I’ll then use in various ways in composing quizzes. One benefit of doing the tedious process of data entry myself is I begin to notice patterns, especially characteristics of this particular source of synonyms. Do these patterns then suggest some flaws in that source?

According to my synonym source these words have exactly the same synonyms which would imply they are very similar, if not identical – but is this true? And these words: metonymy,  onomatopoeia,  malapropism,  alliteration,  adumbration have almost exactly the same synonyms as the first six. Eleven almost identical words – seems crazy, so let’s see how close their definitions are:

anaphora

n.
1. [grammar] the use of a word referring to or replacing a word used earlier in a sentence, to avoid repetition
2. [rhethoric] the repetition of a word or phrase at the beginning of successive clauses

antistrophe

n.
the second section of an ancient Greek choral ode or of one of division of it

aposiopesis

n.
[rhetoric] the device of suddenly breaking off in speech

ellipsis

n.
the omission from speech or writing or a word or words that are superfluous or able to be understood from contextual clues

oxymoron

n.
a figure of speech in which apparently contradictory terms appear in conjunction

a phrase made by combining two words that are contradictory or incongruous

a figure of speech by which a locution produces an incongruous seemingly self-contradictory effect

tropology

n.
the figurative use of language
<special usage> (Christian theology} the figurative interpretation of the scriptures as a source of moral guidance

So of these ellipsis and anaphora do seem quite close, but the rest are not that connected. aposiopesis has some connection with the the ellipsis and anaphora, but antistrophe and tropology seem quite unrelated and also very specialized meanings. And of course oxymoron, a widely used word seems to have essentially no connection (other than being descriptive of speech patterns).

Let’s look at the other five:

metonymy

n.
the substitution of the name of an attribute or adjunct for that of the thing meant, for example suit for business executive, or the track for horse racing
[rhetoric] a figure of speech that consists of the use of the name of one object or concept for that of another to which it is related, or of which it is a part, as “scepter” for “sovereignty” or “the bottle” or “strong drink” or “count heads (or noses)” for “count people”

onomatopoeia

n.
the formation of a word from a sound associated with what is named
<special usage> the use of such words for rhetorical effect

words that sound like, or suggest, their meaning

1. the formation of a word, as cuckoo, meow, honk, or boom, by imitation of a sound made by or associated with its referent
2. a word so formed
3. the use of imitative and naturally suggestive words for rhetorical, dramatic, or poetic effect

malapropism

n.
the mistaken use of a word in place of a similar-sounding one, often with unintentionally amusing effect

deliberate misuse of a word or mangling of the English language, often done for comic effect

1. the unintentional misuse of a word by confusion with one of similar sound, esp. when creating a ridiculous effect
2. the habit of misusing words in this manner

alliteration

n.
the occurrence of the same letter or sound at the beginning of adjacent or closely connected words

1. the commencement of two or more stressed syllables of a word group either with the same consonant sound or sound group, or with a vowel sound that may different syllable to syllable
2. the commencement of two or more words of a word group with the same letter

the repetition of similar sounds, especially at the beginnings of words, in written speech or the spoken word

adumbration

v.
report or represent in outline
<special usage> indicate faintly;  foreshadow or symbolize; overshadow

1. to produce a faint image or resemblance of; to outline or sketch
2. to foreshadow; prefigure
3. to darken or conceal partially; overshadow

So adumbration really has no connection, almost at all with any of the other ten, so synonyms as similarity really fail there. metonymy seems to have some connection, but onomatopoeia and malapropism (which are vaguely related to each other) and alliteration have very little connection to any of the first six.

Now at least ten of the words have something to do with patterns of speech (but so do lots of other words) so I think this synonym source is doing a disservice to treat these as similar as they do (doing a little reverse engineering I think this apparent similarity is an attribute to their data structures and methods of retrieving “synonyms” by actually grabbing all the synonyms of a particular sense/meaning of the word, i.e. really in terms of the experiment I’m doing with synonym trees, the source is doing something similar, i.e. including all the next degree of separation.

Now while this pattern is particularly strong for these eleven words it is also a device they use in other words, even though those might only be a fraction of the total list of synonyms. In short, they’re generating too much overlap.

Now in terms of my practical requirements detecting this similarity would be a good reason to include some of these terms in a quiz for one of the others, not to exclude them because they might have the same definitions. So I think I can use synonyms (possibly both list vector comparison and trees) as measures of similarity, but then I’m actually back to the original possible issue and that is comparing definitions themselves for similarity (not literally, but approximately, as with search). At least the synonyms might provide a smaller set of definitions to have to examine and that’s good as the comparison is slow, per definition, and the comparison for the entire vocabulary is N^2 so I can’t just do brute force, especially during real-time generation of the quiz.

As to the value of any of these words as Word Of The Day, most of them seem sufficiently obscure to be something any of us would rarely use, although obviously alliteration, malapropism and oxymoron seem more useful.  adumbrationonomatopoeia and ellipsis seem like the kind of words you might encounter in some advanced test, so there these six are probably worth learning.