Tags
I continue to do the tedious work of accumulating synonyms (from a single source, with some problems I’m finding) and now have 1772 synonym lists (76 are the same root) and 515 (mostly from vocabulary) still to extract. I’m only getting synonyms for words that match my vocabulary list but sometimes the synonym source has separate lists (and sometimes same) for inflections and derivatives so that’s why my synonym list is now almost as large as my vocabulary list. I’m glad I’ve done this much as it gives me much better guidance how I will add all this work (just done in an MSWord file with certain coding conventions) to my master XML vocabulary file.
How I’ll use the synonyms remains to be seen, but here’s a short analysis of just these two words.
I’m trying two difference approaches to using synonyms to define “similarity” between words (to then use that measure to build quizzes, avoiding bad ambiguities and deliberately creating other ambiguity (to make quiz more difficult)). My first approach is a synonym tree – i.e. for a word, get all its synonyms (just those in my vocabulary, i.e. ignoring more common words) and then find the synonyms for those words and iterate. This creates a “tree” with degrees of separation from the original word where I score all the synonyms I find by decreasing weight (degree^n) but then higher when finding the same word on different branches of the tree. I’ve already found only a few degrees of separation leads to silly (useless) results for about three degrees of separation is best. But ignoring the common words may (and does) bias my results.
My other approach is simply to compare a word with all other words scoring the synonyms for how much they overlap which then includes the common words. So now I have enough data I can use these two words as a simple test.
Here’s the synonym tree for depreciate:
level [1] 15 terms
decry(36,2) asperse(10,0) calumniate(13,0) censure(47,20) contemn(4,1) defame(25,4) denigrate(20,0) denounce(47,4) deprecate(14,1) derogate(7,0) discountenance(21,3) disparage(39,0) malign(29,11) traduce(10,0) vilify(36,2)
level [2] 48 terms
reprehend(11,0) reprobate(15,2) rebuke(51,6) admonishment(8,0) admonition(13,0) castigation(7,1) obloquy(15,6) remonstrance(7,0) reprehension(42,7) reproach(46,2) reproof(11,0) stricture(7,0) admonish(21,2) berate(16,0) castigate(33,3) animadvert(117,11) cavil(3,0) impugn(27,2) ostracize(19,0) remonstrate(28,0) reprove(13,0) upbraid(14,0) disdain(40,6) besmirch(11,1) belie(29,2) stigmatize(10,0) vituperate(25,0) excoriate(32,0) adjudicate(8,0) declaim(19,2) proscribe(23,1) expostulate(5,1) disconcert(31,2) abash(3,0) discomfit(35,0) antipathetic(7,0) baleful(20,1) baneful(20,2) deleterious(16,3) inimical(18,1) malefic(17,0) maleficent(22,3) malevolent(25,1) noxious(28,4) pernicious(38,3) rancorous(11,1) assail(20,0) debase(45,8)
level [3] 84 terms
wanton(72,11) incorrigible(16,2) berating(16,0) comeuppance(4,2) expostulation(11,3) objurgation(41,0) reproval(6,0) upbraiding(14,0) retribution(16,1) calumny(3,0) animadversion(10,0) aspersion(16,0) ignominy(10,1) invective(22,5) vituperation(15,0) attribution(5,2) diatribe(16,1) disapprobation(7,0) disparagement(27,2) imputation(5,0) inculpation(40,0) reprobation(39,0) odium(34,4) opprobrium(15,0) enjoin(37,1) exhort(25,1) lambaste(40,1) scarify(52,2) scourge(37,1) construe(15,0) elucidate(15,0) interpose(13,0) opine(15,0) fulminate(23,0) execrate(22,3) descry(14,0) espy(12,0) demur(27,1) inveigh(22,0) recriminate(32,3) contravene(35,2) gainsay(20,1) antipathy(25,4) contumely(11,0) haughtiness(15,1) hauteur(18,0) insolence(25,3) superciliousness(39,5) defile(33,4) confute(24,2) controvert(20,1) bloviate(44,4) perorate(6,0) interdict(12,0) dissuade(16,0) confound(33,0) nonplus(37,0) calamitous(23,1) pestilent(24,0) pestilential(20,1) nocent(11,0) nocuous(5,0) prejudicious(16,0) oppugnant(10,0) heinous(34,3) hideous(40,1) nefarious(35,3) waspish(15,4) fetid(28,2) insalubrious(18,0) noisome(36,0) pestiferous(12,0) iniquitous(5,0) miasmatic(13,3) miasmic(19,2) acrimonious(31,7) demean(18,0) abase(14,0) debauch(26,1) debilitate(23,4) deprave(13,0) adulterate(38,1) vitiate(15,3) bestialize(10,0)
Note that the total number of words, within three degrees of separation, is 258. But also, critically, deprecate is a first level synonym for depreciate so that entire branch of the tree, from the first level down, will be the same. And most of the synonyms, from these first three degrees of separation, do have some connection with depreciate itself but by the third degree the words are beginning to be fairly remote.
Now also note if I extend the tree to 15 degrees of separation I get 1208 or about two-thirds of my entire list. The top 10, with the scores based on my current “best” weights are:
(censure 5.147 1) (reproach 3.247 2) (vilify 3.127 1) (malign 3.084 1) (rebuke 3.075 2) (disparage 2.770 1) (defame 2.734 1) (denounce 2.720 1) (berate 2.424 2) (asperse 2.342 1)
where two of the synonyms (rebuke, berate) are found at the second degree of separation. The somewhat larger list of “top” synonyms is (score is second column, first found at degree is second column, and number found (all degrees) is third column:
censure | 4.613 | 1 | 37 |
reproach | 3.124 | 2 | 26 |
vilify | 3.014 | 1 | 16 |
rebuke | 2.935 | 2 | 23 |
malign | 2.864 | 1 | 14 |
denounce | 2.664 | 1 | 16 |
disparage | 2.584 | 1 | 12 |
defame | 2.564 | 1 | 11 |
berate | 2.261 | 2 | 16 |
asperse | 2.229 | 1 | 9 |
decry | 2.157 | 1 | 10 |
castigate | 2.149 | 2 | 16 |
upbraid | 2.148 | 2 | 15 |
denigrate | 2.117 | 1 | 8 |
traduce | 2.062 | 1 | 8 |
pernicious | 1.982 | 2 | 15 |
calumniate | 1.950 | 1 | 7 |
deprecate | 1.877 | 1 | 8 |
noxious | 1.833 | 2 | 14 |
reprove | 1.773 | 2 | 11 |
derogate | 1.745 | 1 | 6 |
admonish | 1.644 | 2 | 11 |
obloquy | 1.608 | 2 | 12 |
discountenance | 1.542 | 1 | 6 |
deleterious | 1.495 | 2 | 10 |
contemn | 1.467 | 1 | 5 |
reproof | 1.457 | 2 | 9 |
vituperate | 1.455 | 2 | 8 |
reprehend | 1.417 | 2 | 7 |
censure is one of the words where I think the source is a little confusing; it simply includes way too many, even distantly related, “synonyms” and so I believe this impacts this method of analysis a bit.
Now looking at deprecate (which is missing the sense of financial meanings that depreciate has) here’s its smaller tree:
level [1] 5 terms
depreciate(63,11) derogate(7,0) discountenance(21,3) disparage(39,1) expostulate(5,2)
level [2] 17 terms
decry(36,2) asperse(10,0) calumniate(13,0) censure(47,19) contemn(4,0) defame(25,4) denigrate(20,0) denounce(47,4) malign(29,11) traduce(10,0) vilify(36,2) disconcert(31,2) abash(3,0) discomfit(35,0) disdain(40,7) dissuade(16,1) remonstrate(28,3)
level [3] 55 terms
reprehend(11,0) reprobate(15,2) rebuke(51,6) admonishment(8,0) admonition(13,0) castigation(7,1) obloquy(15,6) remonstrance(7,0) reprehension(42,6) reproach(46,2) reproof(11,0) stricture(7,0) admonish(21,1) berate(16,0) castigate(33,3) animadvert(117,8) cavil(3,0) impugn(27,2) ostracize(19,0) reprove(13,0) upbraid(14,0) besmirch(11,1) belie(29,2) stigmatize(10,0) vituperate(25,0) excoriate(32,0) adjudicate(8,0) declaim(19,2) proscribe(23,1) antipathetic(7,0) baleful(20,1) baneful(20,2) deleterious(16,3) inimical(18,1) malefic(17,0) maleficent(22,3) malevolent(25,1) noxious(28,4) pernicious(38,3) rancorous(11,1) assail(20,0) debase(45,8) confound(33,0) nonplus(37,0) antipathy(25,5) contumely(11,0) disparagement(27,2) haughtiness(15,1) hauteur(18,0) insolence(25,3) superciliousness(39,5) exhort(25,1) demur(27,1) inveigh(22,0) recriminate(32,3)
Note that depreciate is a synonym of deprecate at first degree of separation so that entire branch of this tree will be the same. And given the close similarity of these words (depreciate is really a replacement for deprecate) it’s interesting that this source has so many more synonyms for depreciate (most of which could be synonyms of deprecate) and deprecate has only one unique synonym, expostulate, which really doesn’t fit very well within the meaning.
In short, for these words, the synonym tree is fairly useless. I did a lot of other analysis (fairly tedious to do by hand, will have to add to my code) but now, as I attempt to write up results, I realize most of what I found is just artifact of the poor synonym lists of this source (one of its many problems), so I’ll halt here and attempt to find a better pair at some point (especially after automating the analysis).