Word Count

From Librivox wiki
Revision as of 22:31, 14 April 2023 by Msfry (talk | contribs) (→‎Websites)
Jump to navigationJump to search

LibriVox member peegee has written a script for web browsers which may make the BC's job of compiling word counts for the Magic Window a little easier.

How it Works

The script runs against the HTML ebooks on Project Gutenberg.

  1. you click the paragraph where you want to start the count,
  2. it asks you the target number of words,
  3. it quickly goes through every paragraph from that point onwards and counts the words and the running total
  4. it stops when it reaches the target, or the end of the chapter if before
  5. the page is temporarily changed to display the word counts right there at the end of each paragraph
  6. you can repeat this as many times as you like, each time you click a paragraph the temporary page changes are removed

Screenshots

Here are a few screenshots to illustrate the process:

Installing the Script

The method of installation depends on the browser (Firefox and Chrome may need to be re-started after installation, Opera does not):

Firefox

  1. Greasemonkey You will first need the GreaseMonkey add-on for Firefox which is available from this link.
  2. Firefox Install Once you have GreaseMonkey installed, go here and click on the Install button

Google Chrome

  1. TamperMonkey First you'll need to install TamperMonkey from the Chrome store.
  2. WordCount Then, you'll be able to install the word count extension from here.

Opera

  1. To enable User JavaScript, use Tools > Preferences > Advanced > Content > JavaScript options, and select the directory where you will put your User JavaScript files (probably best if its a new folder with nothing else in it).
  2. the script then go to the script
  3. click the Install button to get the script in a new tab in Opera.
  4. from the Opera menu click File > Save As to save it into the folder you chose in the first step (probably best to just keep the suggested file-name - whatever name you choose it MUST end in .user.js )

Limitations

The script only works on the Gutenberg online HTML books, not the text or zipped HTML, or other formats.

Support

If you have any problems installing or using this script you can either post a message in this forum thread or send a Private Message to peegee.

Other Alternatives

Web Browser Bookmarklets

Gutencount (word counter)

This is a "universal" word counter bookmarklet. The code below will, with a single click, return the word count for the body of a Gutenberg book.

  • It counts the ebook text, excluding the Gutenberg disclaimer and legalese (everything between "*** START OF _____ ***" and "*** END OF ____ ***"). This will work whether you are on the main page, the HTML page (or as-submitted), or on the plain text page.
    • Slight discrepancies may be present due to extra text (Transcriber's notes, Book summary, subtitles, etc) in some formats that are not present in the others.
  • BONUS: On the Gutenberg search results page, it will append the word count to all search results! (So, if you're looking for a short short story, you don't have to go counting every book that comes up in search.)
  • BONUS: On a normal webpage, it will count all the text on the page.
  • BONUS: If any text is highlighted, it will count the words in the selection.
  • NEW(2023-04-14): When in the HTML page of on ebook, click on the chapter heading to get a word count for that chapter.
  • Update(2023-04-11): Word/character count will now be displayed in a fixed box in the upper right corner of the window, instead of a pop-up alert. Double click to dismiss.
javascript:(function(){let a=![];function b(g,h,i=''){return c(g['match'](/(?<=\*{3} START.*?\*{3}).*(?=\*{3} END.*?\*{3})/s)[0x0]['trim'](),h,i);}function c(g,h,i=''){let j=g['trim']()['split'](/--|[\s\*—]+/)['length'];if(i)return j;f(j+'\x20words,\x20'+g['length']+'\x20chars\x0ain\x20'+h);}function d(g,h){fetch(g)['then'](i=>i['text']())['then'](i=>{let j=b(i,'ebook',h);h&&(h['innerHTML']+='wc:'+j);})['catch'](i=>{});}function e(){let g=document['location']['toString'](),h=window['getSelection']()+'';if(h['length']>0x0)c(h,'selection');else{if(g['match'](/gutenberg.org\/(files|cache\/epub)\/\d+/))b(document['body']['innerText'],'ebook');else{if(g['match'](/gutenberg.org\/ebooks\/(\d+)/))d(g+'.txt.utf-8',document['querySelector']('#cover'));else{if(g['match'](/gutenberg.org\/ebooks\/(subject|search)/))for(const i of document['querySelectorAll']('.booklink>a>span:nth-child(2)')){d(i['parentElement']['href']+'.txt.utf-8',i);}else!a&&c(document['body']['innerText'],'webpage');}}}}e();function f(g){let h=document['getElementById']('xcount')||document['createElement']('div');h['id']='xcount',h['style']='position:fixed;top:0;right:0;width:10em;height:2em;background:#333c;color:#fff;z-index:10000;padding:0.5em;text-align:right;',h['innerHTML']='',h['ondblclick']=function(){this['remove']();},document['body']['appendChild'](h),h['innerHTML']=g;}document['onclick']=function(g){let h=document['getSelection']();if(h){let i,j;try{i=h['anchorNode']['parentNode']['tagName']['match'](/^H\d/),j=h['anchorNode']['parentNode']['parentNode']['tagName']['match'](/^H\d/);}catch(k){}if(i||j){let l=document['createRange'](),m=j?h['anchorNode']['parentNode']['parentNode']:h['anchorNode']['parentNode'],n=j?h['anchorNode']['parentNode']['parentNode']['nextElementSibling']:h['anchorNode']['parentNode']['nextElementSibling'];while(n['nextElementSibling']){if(n['nextElementSibling']&&n['nextElementSibling']['tagName']['match'](/^(H\d|SECTION)/))break;n=n['nextElementSibling'];}l['setStartAfter'](m),l['setEndAfter'](n),h['addRange'](l);}}e();};}());

Chapter Counter (beta)

This script will (attempt to) count the number of words in indexed chapters of a Gutenberg book. Navigate to the HTML or the plain text page, and activate the script.

Current Limitations:

  • The Table of contents (TOC) must be present, and labeled Contents.
  • TOC should not include any sections that appear before the contents list (eg, Preface).
  • Does not work if chapter headings do not match the TOC (eg, TOC lists Chapter IV, but chapter headings appear as IV).
  • Does not work if the TOC is formatted strangely (eg page numbers or other text interspersed between the chapter titles).
  • Update 2023-04-14: Corrected the word count method so "em dashes" are accurately accounted for.
javascript:(function(){if(document['location']['toString']()['match'](/gutenberg.org\/(files|cache\/epub)\/\d+/)){let d=document['body']['innerText']['match'](/(?<=\*{3} ?START.*?\*{3}).*(?=\*{3} ?END.*?\*{3})/s)[0x0]['trim']()['split'](/\n+/),e=document['getSelection']()['toString'](),f=e?e:'contents',g=[],h='',i='',j={};for(var a of d){a=a['trim']();if(!a||a['length']==0x0)continue;if(!h&&a['match'](RegExp('^'+f,'i'))){h='INDEX';continue;}else{if(h=='INDEX')g['length']>0x0&&c(g[0x0],a)?(h='BODY',i=g[0x0],j[i]=''):g['push'](a);else{if(h=='BODY'){if(g[g['indexOf'](i)+0x1]!==undefined&&c(a,g[g['indexOf'](i)+0x1])){i=g[g['indexOf'](i)+0x1],j[i]='';continue;}j[i]+=a+'\x20';}}}}var b=document['getElementById']('xcount')||document['createElement']('div');b['id']='xcount',b['style']='position:fixed;top:0;right:0;width:20em;height:20em;overflow-y:scroll;background:#333c;color:#fff;',b['innerHTML']='',b['ondblclick']=function(){this['remove']();},document['body']['appendChild'](b);for(const k in j){b['innerHTML']+=k+':\x20'+j[k]['split'](/--|[\s\*—]+/)['length']+'\x20words<br/><br/>';}}function c(l,m){if(!l||!m)return![];if(l['toUpperCase']()['replace'](/\W/g,'')==m['toUpperCase']()['replace'](/\W/g,''))return!![];if(l['match'](/^chapter.+\./i)&&l['toLowerCase']()['match'](/^chapter.+\./)[0x0]==m['toLowerCase']()['match'](/^chapter.+\./)[0x0])return!![];return![];}}());


For questions or issues with either of the above scripts, please post in this forum thread, or send a Private Message to quartertone.

Note: The scripts above have been minified. To view the scripts in a human-readable format, please visit vox.quartertone.net.

Websites

Various websites provide an interface to count the number of words in copy-pasted text. Below are some sites that provide accurate word counts:

Document Editors

Most document editors (Microsoft Word, Google Docs, LibreOffice, etc) and some basic text editors have a built-in word counting feature. Copy and paste your text into a document, select the text you want to count.

  • Microsoft Word
    • Review → Word Count
    • MS Word will also automatically display the word count on the bottom status bar, unless this feature has been disabled. To re-enable it, right-click on the bottom status bar and tick the Word Count option. Just highlight the words you want counted, either part or all of the text. Wait a moment and the word count will display (lower left corner)
  • LibreOffice Writer
    • Tools → Word Count
    • A running word count is displayed on the bottom status bar. The status bar cannot be modified but the entire bar may be toggled hidden or displayed by selecting View → Status Bar
  • Google Docs
    • Tools → Word count
    • Keyboard shortcut: Ctrl + Shift + C