Monday, October 13, 2014

Tengwar Transliterator 1.2

The new  virtual Tengwar keyboard now makes
Tengwar input easier (still a work in progress)
The Tengwar Transliterator has had some good progress lately, so I decided to publish a new version. Check it out here.

If you're interested, here is a summary of what has changed:

Tengwar Virtual Keyboard
When I added "reverse transliteration" (Tengwar to Latin) in the previous update, I wanted to come up with a way to make Tengwar input easier, so I developed a Tengwar virtual keyboard. It's basically a little character map widget. I don't consider it finalized, but it works decently, and is certainly faster than looking up and typing control codes for the extended Tengwar characters.

Numeral Support
I finally got around to adding support for numeral transliteration, which required learning how numerals in Tengwar actually work. Turns out there are a few variations for this, so I added some configuration options and set the default to what I gather is the most authentic mode, which is a base-12 system with the least significant digit on the left (i.e. "backwards" if you're used to Arabic numerals). Transliteration back to Arabic numerals is only partially supported at the moment.

Word Spacing Options
In some situations, Tengwar is written without spacing between words, so I put in an option for this. When spacing is removed, the implementation uses the Unicode zero-width space (U+200B). Applications that support Unicode recognize this invisible character as a word boundary, so Tengwar text that is encoded this way will not have visible spacing between words, but lines will correctly wrap without cutting words in half.

It Looks a Little Better
I am no designer, but I added some CSS to make the whole thing look a little nicer (in my opinion).

That's it for now. Enjoy (all one or two of you, that is)!

Tuesday, September 30, 2014

Tengwar Transliteration Progress

In a rare departure from my usual habits, I've managed to maintain enough interest in a hobby programming project to actually have an update beyond the initial release. Therefore, I present to you the Tengwar Transliterator version 1.1.

You can check it out here.

Its new features, in a nutshell, include:
The tricky word "governments"
rendered in Tengwar Telcontar,
Tengwar Formal CSUR, and
FreeMono Tengwar

  • Orthographic transliteration - This is now the default transliteration mode. It provides a letter-by-letter approach to transliteration, as opposed to the phonetic approach in the first release. Phonetic mode is still available, but the orthographic mode produces output that is much closer to how people typically transliterate English into Tengwar.
  • Reverse transliteration - It is now possible to transliterate back to Latin characters from Tengwar. This feature should be considered to be in a beta stage, however.
  • Tengwar font selection - Users may now choose one of the three fonts provided by the Free Tengwar Project for displaying Tengwar text. Firefox users are strongly encouraged to choose the excellent Tengwar Telcontar font. Sadly, this Graphite font does not display correctly in other browsers, to my knowledge.

What's Next?

  • The results of the orthographic mode are actually even better than I expected. With some key enhancements, it should be able to be modified to become a Common Mode transliterator, which has been my goal the entire time. I hope to release the Common Mode transliterator some time this autumn.
  • Now that transliteration from Tengwar back to Latin characters is supported, I would also like to integrate a virtual Tengwar keyboard to make Tengwar input much easier.
  • I am also looking forward to broadening the scope and adding support for other Middle-earth alphabets such as the Cirth, and the futhark style runes found in The Hobbit.
  • Code refactoring! This is badly needed. A lot of the new features have been pretty rapidly prototyped, and I'm looking forward to refacting the code into something nice.
Have a feature in mind that you'd like to see, or have other feedback? Leave a comment or get in touch with me.

And as always, the code is available on Github. It's a horrible monstrosity at the moment, though.

Tuesday, June 17, 2014

Sour Shchi (Russian Cabbage Soup)

I wasn't planning on a food post for this,
so this is the only photo you get...
What do you do when you have an abundance of homemade lacto-fermented sauerkraut at home? Apparently one option is to make sauerkraut soup. When I learned about shchi, a Russian cabbage soup, and how you can make sour shchi by using sauerkraut in the recipe, I decided to go for it. There really aren't a lot of foods I wouldn't try at least once. Since the end product was surprisingly delicious, with only a subtle sour flavor that melded excellently with the sweet chicken broth, I thought I'd go ahead and share my process for making sour shchi from scratch.

This recipe works well as a template. Change up the vegetables to whatever you think would be good.

Sour Shchi (Russian Cabbage Soup)

Makes 4 servings

INGREDIENTS

Broth

  • 2-3 lbs chicken parts (I prefer wings - an 8 pack works well)
  • Approximately 2 tbsp evaporated milk powder (flour can be substituted here)
  • 2 tbsp vegetable oil
  • 8 cups cold water
  • 1 large yellow onion, peeled and quartered
  • 4 cloves garlic, smashed and peeled
  • 1 bay leaf
  • 2 tsp Kosher salt

Soup

  • 2 large baking potatoes, peeled and chopped
  • 2 cups German-style sauerkraut (or shredded fresh cabbage if you insist)
  • 1 large carrot, peeled and chopped into 1 cm cubes
  • 1 medium onion, peeled and chopped
  • 1 cup cooked chicken, chopped

Garnish

  • Sour cream
  • Fresh dill
  • Chopped green onions

PROCEDURE

Broth

If you don't have some good, homemade chicken stock on hand, here is a method you can use to prepare a tasty broth from scratch relatively quickly (about 45 minutes). While it is not necessary to make your own stock or broth, I find that doing so enhances the quality of homemade soups so much that I rarely consider making soup worthwhile if I don't have time for it. If you want to skip this phase, you can use 64 oz. of store-bought chicken or vegetable broth.

  1. Heat vegetable oil in a medium stainless steel stock pot on medium high-ish heat (I set the dial to the line between medium and medium high heat)
  2. While oil is heating, coat chicken pieces lightly with milk powder
  3. Brown chicken pieces for 5 minutes per side. Do so in two batches if necessary. Don't get freaked out when chicken pieces stick and the skin is blackening in spots.
  4. Remove chicken pieces from stock pot and pour in cold water to deglaze. Scrape and loosen bits of chicken that are still stuck to the stock pot, but don't worry about clearing up the entire surface.
  5. Bring liquid to a simmer and add browned chicken pieces, onion, garlic, bay leaf, and Kosher salt.
  6. Simmer, partially covered, for 20 minutes
  7. Remove chicken pieces from the broth and take the broth off the heat.
  8. Strain broth through a fine sieve and then discard the leftover bits of onion, garlic and bay leaf.
  9. Let cooked chicken cool before removing skin and chopping up 1 cup of the meat for the soup. The rest you can save for chicken salad or whatever else you want to do with it. When I use an 8-pack of wings, my rule of thumb is to use the meat from 4 of them for the soup and save the rest for future use.

Soup

If you made your own broth, you may wish to wash your stock pot to eliminate any lingering bits of chicken that are still stuck to the bottom.

  1. Return chicken broth to the stock pot and heat back up to a simmer
  2. Add potatoes and sauerkraut to the pot and simmer for 20 minutes, or until potatoes are cooked
  3. 7 minutes before potatoes are timed to be cooked, add carrot and onion to the pot
  4. Add chopped chicken and let it reheat for 30 seconds, and immediately dish soup into bowls
  5. Garnish each bowl with a dollop of sour cream, green onions, and fresh dill

Tuesday, June 10, 2014

Phonetic Transliteration of English into Tengwar

One of the things I love about The Lord of the Rings is that J. R. R. Tolkien originally created the setting of Middle-earth as a place where the characters could speak the various invented languages that he spent most of his life working on. Tolkien's languages have interesting grammatical properties, but the focus of this post is merely on one of the alphabets that Tolkien devised for writing his languages. Specifically, I'd like to share the experience I've had so far working on a Web-based transliterator for automatically converting English text from Latin letters to Tolkien's Tengwar alphabet.

If you could care less about all the technical details and just want to mess around with the tool (and help me find bugs), you can go check it out here.

If you're a real nerd, read on...

Some Challenges of Creating a Tengwar Transliterator

Note: If you happen to know a thing or two about writing English in Tengwar, then I should clarify up front that what I have done so far is implement a phonetic transliterator. The so-called Common Mode is my ultimate goal, but as it introduces some additional difficult challenges, I haven't gotten there yet. I can't find an example on the Internet of anyone actually having pulled this off, so it may be a bit of a lofty goal.

Having essentially dived into this project on a whim, I ended up running into more issues than I initially anticipated. They break down into the following problem areas, each of which will be touched on subsequently.
  1. Orthographic differences between the Latin and Tengwar alphabets
  2. Programmatically determining pronunciations of English words
  3. Digital encoding of Tengwar characters and font selection
  4. Displaying Tengwar characters on the Web and browser compatibility issues
To implement reverse transliteration of Tengwar back into Latin text, the following additional problem areas emerge:
  1. Keyboard (or otherwise) input of Tengwar characters
  2. A less easily overcome version of problem 1
Four annoying problems seemed like enough, so I haven't implemented reverse transliteration yet, although I intend to in the future. I've been enjoying pondering how to deal with problems 5 and 6.

Orthographic Differences Between the Latin and Tengwar Alphabets

I'm not going to provide anything in the way of a tutorial for how to read and write Tengwar. If you're interested in that, here are some useful links:
Tengwar and Latin orthography are very different when it comes to writing English, which has very irregular pronunciation. As one out of countless examples, consider words like 'daughter', 'laughter', and 'aghast'. In each of these words we encounter a <gh> digraph with a different pronunciation. There are historical and etymological explanations for irregularities such as this, but the bottom line is that they create headaches for the aspiring Latin-to-Tengwar transcriber.

The primary cause of these headaches is that Tengwar is a phonetic alphabet. Its characters are organized into four series, or témar, and six grades, or tyeller, which represent the place and manner of articulation for the sound represented by each character. In less technical terms, that means that similar-looking Tengwar characters tend to represent similar phonetic sounds, which is pretty cool.

The simplest solution to this orthographic discrepancy was to just aim for a phonetic transliteration for now, which means the bottom line is that we need the computer to be able to deal with these unpredictably spelled English words and convert them to phonetic representations in Tengwar. This led me to my next problem.

Programmatically Determining Pronunciations of English Words

How can a computer tell how an English word is pronounced? I wasn't quite sure at first. Very similar words like 'laughter' and 'daughter' with different pronunciations cause me to seriously doubt the efficacy of trying to determine a word's pronunciation by algorithm. Instead, I assumed there must be some kind of database or Web service that I could use.

While assessing my options, I discovered The CMU Pronouncing Dictionary from Carnegie Mellon University. It is no surprise that CMU came to my aid, as they are a research leader in text/speech analysis. The CMU Pronouncing Dictionary contains North American English pronunciations for 125,000 words (including common proper nouns), and it is machine-readable. This was my missing link.

I wrote a thin object-oriented wrapper around this dictionary file in PHP which allowed me to access pronunciation information in my programs. Here is a succinct usage example:

<?php
$words = array('daughter', 'laughter', 'aghast');

foreach ($words as $word) {
  $p = new Pronunciation($word);
  echo "$word is pronounced / $p->pronunciation /\n";
}
?>

The above outputs:

daughter is pronounced / D AO1 T ER0 /
laughter is pronounced / L AE1 F T ER0 /
aghast is pronounced / AH0 G AE1 S T /

As you can see, pronunciations are represented as a space-delimited sequence of phonemes. The CMU dictionary defines 39 phonemes for North American English. The exact number of phonemes in the English language is dialect-dependent, but I had to pick something and go with it. The numbers at the end of vowel phonemes represent the relative amount of stress that is placed on each syllable. For my purposes, I disregard the numbers, because Tengwar does not denote stress.

Note: One obvious flaw with this solution is that non-English words cannot currently be transliterated. I'm considering writing in some code to fall back to a simple orthographic transliteration for words that aren't in the dictionary. We'll see.

But aside from this caveat, I consider the problem solved. At this point, I was able to move on to defining a mapping between the CMU phoneme set and the Tengwar characters, although this introduced more questions.

Digital Encoding of Tengwar Characters and Font Selection

Just what is the best way to digitally represent these weird Tengwar characters? You may be intuitively thinking something along the lines of, well, can't you just use a special Tengwar font that maps the glyphs we need over the A-Z character set? Back in the 1990s when Tolkien enthusiasts everywhere were discovering the Internet, this actually wasn't a bad solution, and there are still a lot of old sites on the Internet where this is how they do it. There are some substantial problems with this approach, though.

First of all, a true one-to-one mapping between Latin and Tengwar letters is impossible. The Latin alphabet has 26 letters, while the Tengwar alphabet has 36 letters (tengwar) and no less than 8 diacritics (tehtar), depending on the writing mode in use. Like the writing systems of Arabic and Hindi, Tengwar typically denotes consonant sounds with characters and adjacent vowel sounds using diacritics over or under them. There are also several digraphs in English such as <th>, <sh>, and <ch> that are represented by only one letter in Tengwar. A simple example is mellon, the password used by the Fellowship to open the Moria gate. Here it is in Tengwar, requiring three characters to write.
The Unicode character encoding standard helps us deal with this more elegantly by defining how non-Latin characters should be defined in fairly precise terms. While the Unicode standard is silent on how to specifically encode the fictional Tengwar alphabet, it does provide what is called the Private Use Area, which is basically a range of characters that are left undefined so that they can be used for very specialized applications.

The Free Tengwar Project has come up with a Unicode character mapping using the Private Use Area, and this seemed like the most valid approach to me. They provide three different free fonts, each with different pros and cons. They also provide a keyboard layout for typing Tengwar, if you're a huge enough nerd to need one.

Having opted to use the Free Tengwar Project's fonts and encoding rules, all I had to do was create a set of rules for mapping the CMU phonetic pronunciations into Unicode Tengwar. This was not difficult; just a bit tedious. The only slight issue is that there isn't a completely straightforward mapping for all of CMU's vowel phonemes with the available Tengwar diacritics. I largely based my mapping on this phonetic mode for English.

At this point, only one small problem remained.

Displaying Tengwar Characters on the Web

What are the chances of most people having the Free Tengwar Project's fonts installed on their computer? Not good, to say the least. Fortunately, I knew going into this project that it is possible to embed fonts on Web pages. All I had to do was learn how to do it. Like every other step of this project, the Internet provided me with a few alternatives for doing the font embedding, but in the end, the easiest method by far was to use the Font Squirrel Web Font Generator, which, given a font file, produces a .zip archive containing converted font files and an example CSS stylesheet.

Note: A word of caution to anyone else who may be interested in using the Font Squirrel tool to create Tengwar Web fonts: In order for things to work properly, you must use "Expert mode" and change the Subsetting setting to "No Subsetting" or "Custom Subsetting" with the Private Use Area range specified.

I've mostly just tested this technique in Firefox and Chrome, but it seems to work reliably for displaying Tengwar on computers without any of the fonts installed. If you try it out in Internet Explorer (or any browser on Windows) and run into issues, let me know if you want.

There are still some issues I'm running into with some of these fonts, and I'll continue to play with the code. Specifically, I would love to switch over to the Tengwar Telcontar font, but it seems too finicky for general use.

Conclusion and Next Steps

If you've stuck with me until now, I commend you for probably being a pretty big nerd. I'm releasing what I have so far and calling it alpha software. Here's another link if you're too lazy to scroll back up to the top. I have every intention of continuing to add features. Here are some ideas for future additions:
  • Probably tweak vowel and diphthong output a bit more
  • Numeral support
  • Possibly implement orthographic transliteration for non-dictionary words
  • "Reverse" transliteration from Tengwar into IPA (I doubt the existence of a reliable way to take phonetic Tengwar back to English)
  • Transliteration from Tengwar to Latin characters using Sindarin and Quenya modes
    • This will require implementation of a "virtual keyboard" for inputting the Tengwar characters, unless anyone has a better idea
  • Support for other Tolkien alphabets (Cirth, possibly Black Speech/One Ring inscription style)
  • Possibly a phonetic German mode
  • Ultimately, transliteration from English into the Common Mode
As always, I have also released my code on GitHub for those of you who are interested.

Thursday, May 29, 2014

Back to Life

"Glider"
In honor of the life I'm attempting to breathe back into this blog, I thought I'd whip up a quick post about Life itself. Conway's Life, that is.

If you've never heard of it, Life is a cellular automaton, which is sort of like a virtual gameboard or grid where the pieces multiply or die off according to simple rules. These rules can be thought of as modeling prosperity or death from starvation and/or overcrowding. In Life, there is only one variety of cell and each spot on the board can either be occupied or not, with changes occurring according to the following rules:
  • New cells are spawned in empty cells with 3 populated neighbors
  • Cells with 2 or 3 neighbors remain "alive"
  • Cells with fewer than 2 or more than 3 neighbors "die", and disappear
These rules are applied to a "generation" (i.e. the current arrangement of the pieces on the board) to define the next generation. Once this new generation is derived, it completely replaces the previous generation and the entire process can start over. Since the process is completely deterministic based on the initial state (or first generation) and the rules in place, Life is sometimes called a zero-player game. Exciting!

Why "play" a zero-player game? As it turns out, there are some pretty interesting patterns to be observed. As a computer programmer, I tend to think of Life as a 2-dimensional visual programming language, where the programs are certain deliberate arrangements of cells on the board that create interesting patterns or effects in the simulation.

The so-called "glider gun" is a good example of this sort of thing. It is a careful arrangement of cells, which over the course of 30 generations, produces a "glider" which "flies" off from a point near the center of the glider gun arrangement. The glider gun will repeat its pattern indefinitely, producing a constant stream of gliders. It's best to see it in action, which you can do by checking out my mediocre Life simulator (or just scroll down).

Tip: I never made this clear in the interface, but you can click any position on the grid below to toggle its life/death state.

I found this while rummaging around on my server, and it was something I wrote for fun back in January 2012. I'll admit you could probably search the Internet for Conway's Life and find a host of better online Life implementations. This is mine, though, and as always, feel free to check out the code.