I have mixed-language text files, and would like to count the simple total number of printable characters of one of the languages. It helps that the languages inhabit different unicode ranges.
My specific use-case involves Hebrew, Polytonic Greek, and English -- but I imagine a solution to this problem could be generalized for other contexts, too.
I would like to count to the Hebrew characters only -- that's Unicode [\u0590-\u05ff]. Here's a brief sample input file (which, by my manual count, contains 62 Hebrew characters):
[ Ps117 ]
h1: הללו את יהוה כל גוים שבחוהו כל האמים
r1: Praise the LORD, all nations! Extol him, all peoples!
g1: Αλληλουια. Αἰνεῖτε τὸν κύριον, πάντα τὰ ἔθνη, ἐπαινέσατε αὐτόν, πάντες οἱ λαοί,
b1: Alleluia. Praise the Lord all you nations: praise him all you peoples.
h2: כי גבר עלינו חסדו ואמת יהוה לעולם הללו יה
r2: For great is his steadfast love toward us; and the faithfulness of the LORD endures for ever. Praise the LORD!
g2: ὅτι ἐκραταιώθη τὸ ἔλεος αὐτοῦ ἐφ' ἡμᾶς, καὶ ἡ ἀλήθεια τοῦ κυρίου μένει εἰς τὸν αἰῶνα.
b2: For his mercy has been abundant toward us: and the truth of the Lord endures for ever.
I'm on Ubuntu 16.04.2 LTS, if that helps. I imagine perl would be a likely option here, or some shell script ... but I don't know these things, which is why I'm asking!
For the curious, the lines in my input are: h= Hebrew; r= Revised Standard Version; g = Greek Septuagint; b = Brenton translation of Septuagint; in each case followed by a verse number.