Word Count: Difference between revisions

From Librivox wiki
Jump to navigationJump to search
(→‎Other Alternatives: New bookmarklets, removed defunct site, added web based counters, expanded Text editors section)
Line 42: Line 42:
If you have any problems installing or using this script you can either post a message in [http://forum.librivox.org/viewtopic.php?t=24437 this forum thread] or send a Private Message to ''peegee''.
If you have any problems installing or using this script you can either post a message in [http://forum.librivox.org/viewtopic.php?t=24437 this forum thread] or send a Private Message to ''peegee''.


== Other Alternatives ==
=== Web Browser Bookmarklets ===


=== Script for Gutenberg Texts ===
==== Gutencount (word counter) ====
This website script will count out the number of words in each chapter. Enter the Gutenberg project ID number, fill in additional information (how many chapters there are, how the chapter headers are written, etc) and it will return the chapter word counts and the first and last words in the chapter.
This is a "universal" word counter bookmarklet. The code below will, with a single click, return the word count for the body of a Gutenberg book.


https://karikarito.com/wordcount
* It counts the ebook text, excluding the Gutenberg disclaimer and legalese (everything between "*** START OF _____ ***" and "*** END OF ____ ***"). This will work whether you are on the [https://www.gutenberg.org/ebooks/33843 main page], the [https://www.gutenberg.org/ebooks/33843.html.images HTML page] (or [https://www.gutenberg.org/files/33843/33843-h/33843-h.htm as-submitted]), or on the [https://www.gutenberg.org/ebooks/33843.txt.utf-8 plain text page].
** Slight discrepancies may be present due to extra text (Transcriber's notes, Book summary, subtitles, etc) in some formats that are not present in the others.
* BONUS: On a normal webpage, it will count all the text on the page.
* BONUS: If any text is highlighted, it will count the words in the selection.


=== Web Browser Bookmarklet ===
<nowiki>javascript:!function(){let t=document.location.toString(),e=window.getSelection()+"";function n(t,e){o(t.match(/(?<=\*{3} START.*?\*{3}).*(?=\*{3} END.*?\*{3})/s)[0].trim(),e)}function o(t,e){t.trim().match(/[\*\S]+/g),alert(t.trim().match(/[\*\S]+/g).length+" words, "+t.length+" chars\nin "+e)}var c;e.length>0?o(e,"selection"):t.match(/gutenberg.org\/(files|cache\/epub)\/\d+/)?n(document.body.innerText,"ebook"):t.match(/gutenberg.org\/ebooks\/(\d+)/)?(c=document.location+".txt.utf-8",fetch(c).then(t=>t.text()).then(t=>n(t,"ebook")).catch(t=>alert(t))):o(document.body.innerText,"webpage")}();</nowiki>
Creating a new bookmark, with the code below, will count the words in a highlighted piece of text.


You can use ''Word Count'' for the bookmark name, and then put the code into the address box, in the ''details'' section.


javascript:(function(){var t;if (window.getSelection) t = window.getSelection();else if (document.selection) t = document.selection.createRange();if (t.text != undefined) t = t.text;if(!t || t == ""){ a = document.getElementsByTagName("textarea"); for(i=0; i<a.length; i++) { if(a[i].selectionStart != undefined && a[i].selectionStart != a[i].selectionEnd) { t = a[i].value.substring(a[i].selectionStart, a[i].selectionEnd); break; } }}if(!t || t == "")alert("please select some text");else alert("word count: " + t.toString().match(/(\S+)/g).length);})()
==== Chapter Counter (beta) ====


=== Microsoft Office Word ===
This script will (attempt to) count the number of words in indexed chapters of a Gutenberg book.
Microsoft Word, and probably other text document creation software, has built-in word counting software. Copy and paste your text into a blank page, Select All (or highlight the portion you are word counting), and go Review/Word Count.
Navigate to the HTML or the plain text page, and activate the script.
 
Current Limitations:
* The Table of contents (TOC) must be present, and labeled ''Contents''.
* TOC should not include any sections that appear before the contents list (eg, ''Preface'').
* Does not work if chapter headings do not match the TOC (eg, TOC lists ''Chapter IV'', but chapter headings appear as ''IV'').
* Does not work if the TOC is formatted strangely (eg page numbers or other text interspersed between the chapter titles).
 
<nowiki>javascript:!function(){if(document.location.toString().match(/gutenberg.org\/(files|cache\/epub)\/\d+/)){let i=document.body.innerText.match(/(?<=\*{3} START.*?\*{3}).*(?=\*{3} END.*?\*{3})/s)[0].trim().split(/\n+/),o=[],c="",r="",d={};for(var t of i)if((t=t.trim())&&0!=t.length)if(c||!t.match(/^contents.?$/i)){if("contents"==c)o.length>0&&n(o[0],t)?(c="body",d[r=o[0]]=""):o.push(t);else if("body"==c){if(void 0!==o[o.indexOf(r)+1]&&n(t,o[o.indexOf(r)+1])){d[r=o[o.indexOf(r)+1]]="";continue}d[r]+=t+" "}}else c="contents";var e=document.getElementById("xcount")||document.createElement("div");e.id="xcount",e.style="position:fixed;top:0;right:0;width:20em;height:20em;overflow-y:scroll;background:#333c;color:#fff;",e.innerHTML="",document.body.appendChild(e);for(const t in d)e.innerHTML+=t+": "+d[t].trim().match(/[\*\S]+/g).length+" words<br/><br/>"}function n(t,e){return!(!t||!e)&&(t==e||(!(!t.match(/^chapter.*\./i)||t.match(/^chapter.*\./i)[0]!=e.match(/^chapter.*\./i)[0])||void 0))}}();</nowiki>
 
 
For questions or issues with either of the above scripts, please post in [https://forum.librivox.org/viewtopic.php?t=96792 this forum thread], or send a Private Message to ''quartertone''.
 
Note: The scripts above have been ''minified''. To view the scripts in a human-readable format, please visit [https://vox.quartertone.net vox.quartertone.net].
 
 
=== Websites ===
Various websites provide an interface to count the number of words in copy-pasted text. Below are some sites that provide accurate word counts:
 
* [https://wordcounter.net wordcounter.net]
* [https://wordcounter.io/ wordcounter.io]
* [https://thewordcounter.com/ thewordcounter.com]
* [https://easywordcount.com/ easywordcount.com]
 
 
=== Document Editors ===
 
Most document editors (Microsoft Word, Google Docs, LibreOffice, etc) and some basic text editors have a built-in word counting feature. Copy and paste your text into a document, select the text you want to count.
 
* Microsoft Word
** Review &rarr; Word Count
** MS Word will also automatically display the word count on the bottom status bar, unless this feature has been disabled. To re-enable it, right-click on the bottom status bar and tick the Word Count option.
* Google Docs
** Tools &rarr; Word count
** Keyboard shortcut: '''Ctrl + Shift + C'''
* LibreOffice Writer
** Tools &rarr; Word Count

Revision as of 02:13, 20 February 2023

LibriVox member peegee has written a script for web browsers which may make the BC's job of compiling word counts for the Magic Window a little easier.

How it Works

The script runs against the HTML ebooks on Project Gutenberg.

  1. you click the paragraph where you want to start the count,
  2. it asks you the target number of words,
  3. it quickly goes through every paragraph from that point onwards and counts the words and the running total
  4. it stops when it reaches the target, or the end of the chapter if before
  5. the page is temporarily changed to display the word counts right there at the end of each paragraph
  6. you can repeat this as many times as you like, each time you click a paragraph the temporary page changes are removed

Screenshots

Here are a few screenshots to illustrate the process:

Installing the Script

The method of installation depends on the browser (Firefox and Chrome may need to be re-started after installation, Opera does not):

Firefox

  1. Greasemonkey You will first need the GreaseMonkey add-on for Firefox which is available from this link.
  2. Firefox Install Once you have GreaseMonkey installed, go here and click on the Install button

Google Chrome

  1. TamperMonkey First you'll need to install TamperMonkey from the Chrome store.
  2. WordCount Then, you'll be able to install the word count extension from here.

Opera

  1. To enable User JavaScript, use Tools > Preferences > Advanced > Content > JavaScript options, and select the directory where you will put your User JavaScript files (probably best if its a new folder with nothing else in it).
  2. the script then go to the script
  3. click the Install button to get the script in a new tab in Opera.
  4. from the Opera menu click File > Save As to save it into the folder you chose in the first step (probably best to just keep the suggested file-name - whatever name you choose it MUST end in .user.js )

Limitations

The script only works on the Gutenberg online HTML books, not the text or zipped HTML, or other formats.

Support

If you have any problems installing or using this script you can either post a message in this forum thread or send a Private Message to peegee.

Web Browser Bookmarklets

Gutencount (word counter)

This is a "universal" word counter bookmarklet. The code below will, with a single click, return the word count for the body of a Gutenberg book.

  • It counts the ebook text, excluding the Gutenberg disclaimer and legalese (everything between "*** START OF _____ ***" and "*** END OF ____ ***"). This will work whether you are on the main page, the HTML page (or as-submitted), or on the plain text page.
    • Slight discrepancies may be present due to extra text (Transcriber's notes, Book summary, subtitles, etc) in some formats that are not present in the others.
  • BONUS: On a normal webpage, it will count all the text on the page.
  • BONUS: If any text is highlighted, it will count the words in the selection.
javascript:!function(){let t=document.location.toString(),e=window.getSelection()+"";function n(t,e){o(t.match(/(?<=\*{3} START.*?\*{3}).*(?=\*{3} END.*?\*{3})/s)[0].trim(),e)}function o(t,e){t.trim().match(/[\*\S]+/g),alert(t.trim().match(/[\*\S]+/g).length+" words, "+t.length+" chars\nin "+e)}var c;e.length>0?o(e,"selection"):t.match(/gutenberg.org\/(files|cache\/epub)\/\d+/)?n(document.body.innerText,"ebook"):t.match(/gutenberg.org\/ebooks\/(\d+)/)?(c=document.location+".txt.utf-8",fetch(c).then(t=>t.text()).then(t=>n(t,"ebook")).catch(t=>alert(t))):o(document.body.innerText,"webpage")}();


Chapter Counter (beta)

This script will (attempt to) count the number of words in indexed chapters of a Gutenberg book. Navigate to the HTML or the plain text page, and activate the script.

Current Limitations:

  • The Table of contents (TOC) must be present, and labeled Contents.
  • TOC should not include any sections that appear before the contents list (eg, Preface).
  • Does not work if chapter headings do not match the TOC (eg, TOC lists Chapter IV, but chapter headings appear as IV).
  • Does not work if the TOC is formatted strangely (eg page numbers or other text interspersed between the chapter titles).
javascript:!function(){if(document.location.toString().match(/gutenberg.org\/(files|cache\/epub)\/\d+/)){let i=document.body.innerText.match(/(?<=\*{3} START.*?\*{3}).*(?=\*{3} END.*?\*{3})/s)[0].trim().split(/\n+/),o=[],c="",r="",d={};for(var t of i)if((t=t.trim())&&0!=t.length)if(c||!t.match(/^contents.?$/i)){if("contents"==c)o.length>0&&n(o[0],t)?(c="body",d[r=o[0]]=""):o.push(t);else if("body"==c){if(void 0!==o[o.indexOf(r)+1]&&n(t,o[o.indexOf(r)+1])){d[r=o[o.indexOf(r)+1]]="";continue}d[r]+=t+" "}}else c="contents";var e=document.getElementById("xcount")||document.createElement("div");e.id="xcount",e.style="position:fixed;top:0;right:0;width:20em;height:20em;overflow-y:scroll;background:#333c;color:#fff;",e.innerHTML="",document.body.appendChild(e);for(const t in d)e.innerHTML+=t+": "+d[t].trim().match(/[\*\S]+/g).length+" words<br/><br/>"}function n(t,e){return!(!t||!e)&&(t==e||(!(!t.match(/^chapter.*\./i)||t.match(/^chapter.*\./i)[0]!=e.match(/^chapter.*\./i)[0])||void 0))}}();


For questions or issues with either of the above scripts, please post in this forum thread, or send a Private Message to quartertone.

Note: The scripts above have been minified. To view the scripts in a human-readable format, please visit vox.quartertone.net.


Websites

Various websites provide an interface to count the number of words in copy-pasted text. Below are some sites that provide accurate word counts:


Document Editors

Most document editors (Microsoft Word, Google Docs, LibreOffice, etc) and some basic text editors have a built-in word counting feature. Copy and paste your text into a document, select the text you want to count.

  • Microsoft Word
    • Review → Word Count
    • MS Word will also automatically display the word count on the bottom status bar, unless this feature has been disabled. To re-enable it, right-click on the bottom status bar and tick the Word Count option.
  • Google Docs
    • Tools → Word count
    • Keyboard shortcut: Ctrl + Shift + C
  • LibreOffice Writer
    • Tools → Word Count