Bridge: Customizable Vocabulary Lists

0. What happens before you begin

Someone (maybe you!) has processed a plain text file (*.txt) of your text using Bridge/Lemmatizer. This will produce a .csv file (comma-separated values) that reformats your text vertically in a sheet and automatically lemmatizes any word that can be unambiguously linked to a single dictionary headword (lemma).

Note: there are a few ambiguous words that may be improperly lemmatized by the program; esp. multa- to MVLTA/1 (a noun, "punishment") instead of left unlemmatized as ambiguous.

1. Preparing Your Lemmatization Sheet

1. Import the .csv for your TEXT into your spreadsheet program and save it as a spreadsheet (.xlsx). [1-minute YouTube Tutorial]

2. Rename your sheet as TEXT (will be Sheet1 by default).

3. Import a DICTIONARY sheet into your Spreadsheet by dragging the DICTIONARY sheet into the sheet bar at the bottom of the TEXT Spreadsheet.

You can download Bridge DICTIONARY, as well as LIST spreadsheets at GitClassical/Bridge.

4. Add column headers. Add at least PRINCIPAL_PARTS and SHORT_DEFINITION. You may also add the Bridge columns LONG_DEFINITION, TEXT_SPECIFIC_DEFINITION, and TEXT_SPECIFIC_PRINCIPAL_PARTS — as well as any other columns for personal use (e.g., QUESTIONS, PROBLEMS, etc.).

5a. Link your TEXT sheet to the DICTIONARY by adding these formulae to the first row of data in the correct columns.

a. PRINCIPAL_PARTS: =VLOOKUP(A2,DICTIONARY!A:F,2,FALSE) b. SHORT_DEFINITION: =VLOOKUP(A2,DICTIONARY!A:F,5,FALSE) c. LONG_DEFINITIONS: =VLOOKUP(A2,DICTIONARY!A:F,6,FALSE)

5b. Populate the rest of the rows by selecting the cells and dragging to the end of the text. (you can also copy-paste the formulae). When you populate the spreadsheet, it will automatically bring in principal parts and definitions for words that have been lemmatized.

2. Getting to Know the Lemmatization Sheet

You now have a lemmatization spreadsheet that can be worked on in Excel, Google Docs, or any major spreadsheet program. By default, unambiguous words (i.e. non-homonyms) will have been lemmatized by the Bridge Lemmatizer program, allowing you to focus on the real work that requires human judgment.

The Columns

TITLE: Where you will lemmatize the text by adding the UNIQUE ID that matches the inflected form in TEXT (Column C) with the word in the DICTIONARY. Blank cells (which need TITLEs) should appear yellow.

LOCATION: the book, chapter, poem, line, or section in which the word appears. This will be automatically created by the program that generated the spreadsheet but should be checked.

Note: the program has converted the periods in traditional locations (e.g. 1.1.1) to underscores (e.g., 1_1_1). This prevents data loss within the spreadsheet program (1.10 will quickly become 1.1)

SECTION: The lemmatizer will parse sentences in your text and creating a running count for the sentences.

RUNNINGCOUNT: allows you to sort the spreadsheet back to text order. If you add any words, be sure to add a value between those in the rows above and below.

TEXT: your text runs down the sheet in this column.

NOTE: Latin words that end in -N or -QUE might have been split into multiple rows. If you, lemmatize the actual form and delete the superfluous row. For example, if "relinque" has been split into rows with "relin" and "que", recombine and lemmatize the proper form. But also note that enclitics (-que ~ et) should be lemmatized separately; so if, conversely, you have a TEXT form of iustamque, please split that into two rows, lemmatize, and add an intermediate entry for the Running Count (e.g., 1014, 1014.5).

PRINCIPAL_PARTS: the principal parts of the word; this will automatically fill for valid TITLEs when you save the file. DO NOT MODIFY THIS ENTRY.

SHORT_DEFINITION: a succinct definition of the word; ; this will automatically fill for valid TITLEs when you save the file. DO NOT MODIFY THIS ENTRY.

LONG_DEFINITION: a more expansive definition of the word; ; this will automatically fill for valid TITLEs when you save the file. DO NOT MODIFY THIS ENTRY.

LOCAL_DEFINITION: [optional, but if you are adding custom definitions for your text, they will display here; ; this will automatically fill for valid TITLEs when you save the file. DO NOT MODIFY THIS ENTRY in the "TEXT" page. Instead you will create new definitions after you have lemmatized the text and formatted it into a Glossary

3. Lemmatize your passage by identifying each word and adding new vocabulary to the DICTIONARY.

Read through every entry, manually checking those words that have been lemmatized and lemmatizing the ambiguous forms that were not auto-lemmatized.

BEFORE YOU BEGIN WATCH THIS SHORT (SILENT) VIDEO TO SEE LEMMATIZATION IN ACTION

https://youtu.be/yum-qlHFMSg

Lemmatizing requires you to add the correct TITLE to the TITLE Column (A). A TITLE is either a Known Lemma or a New Lemma. First we will discuss Known Lemmas, then how to handle New Lemmas.

At the start you'll need to find the correct TITLEs in the DICTIONARY sheet. When you start typing in a cell in Column A, possibilities will be suggested if that TITLE already appears in the TEXT sheet. After a little while you'll have a sense of what form a TITLE may take and the process can move quite quickly.

Note that TITLES follow a standard orthography and format:

- TITLES are always ALL-CAPS
- U's are always V's; J’s are I’s; e.g., the TITLE for “abjuro” is ABIVRO.
- homonyms are distinguished by /1, /2, etc. These are usually ranked in a rational order (nouns, adjectives, numbers, pronouns, verbs, adverbs, prepositions, other) but unless you are absolutely certain about the TITLE, please verify it by looking at the DISPLAY LEMMA and DEFs (after you save your file, these will populate automatically)
- There are a few other suffixes to distinguish homonyms: e.g., /N for proper names; /A for proper adjectives.

General principal: lemmatize to the most general form that will be accessible to a novice reader.

For example, say you encounter this word, legente; it's possible that it could be used substantively to indicate a reader (and is a handful of times in common ancient texts) but even if you were lemmatizing one of those moments, if would be better for developing the reader's lexical competency to lemmatize to the verb (LEGO/2).

From this general principal, a few general practices:

* participles, supines, etc. to their verbs * substantives to their adjectives * rare orthographies to the more typical form (if somewhat common, we can add a note to the DISPLAYLEMMA, indicate this in PROBLEM)

For more details about Bridge lemmatization principals, please visit this page.

You can find complete instructions for lemmatizing your text here.

ADDING NEW LEMMAS: if a word is not in the DICTIONARY, first check and triple check that it is not in the DICTIONARY. Consider different spellings; try searching for a principal part (without macrons). If you are certain that the word is not in the DICTIONARY, then add it at the bottom of DICTIONARY. Fill in the PRINCIPAL_PARTS, SHORT_DEFINITION, LONG_DEFINITION, and PART_OF_SPEECH . Don't worry about the other columns, they will be generated automatically or must be added by the Project Director. E.g. if the proper name "Bevis" appeared in your text but there is no BEVIS/N entry in the DICTIONARY. At the bottom of the DICTIONARY sheet, add:

ADDING DATA TO BLANK TITLES (this is VERY UNLIKELY): if a word appears in the DICTIONARY but without any other information (i.e. the TITLE is there, but it lacks dictionary entries and definitions), you can add the DISPLAY LEMMA (with macrons), SHORTDEF, and LONGDEF. I'll be able to harvest these. But note, if there is already information present, if cannot be harvested. You must make a note in the PROBLEM Column of DICTIONARY (Column N)

E.g. if the proper name "Bretus" appeared in your text and BRETVS/N appears in the DICTIONARY but without any other information you would add the dictionary entry and definition in the row.

If you need to add dictionary entries for Latin texts, the fastest and most accurate way to do so is to copy them from LaNe* which is available on Logeion.

[caption id="attachment_1752" align="aligncenter" width="625"]

Figure 4. Logion[/caption]

* LaNe = Woordenboek Latijn/Nederlands, 6th revised edition 2014, a Latin-Dutch translation dictionary, originally based on Pons Globalwörterbuch Lateinisch-Deutsch (Klett) but with full coverage of all entries also contained in the Oxford Latin Dictionary. It is the current gold standard for Latin vowel quantities.

ADDING CUSTOM DEFINITIONS [Optional]: if you are adding custom definitions for your text, be sure that the TEXT_SPECIFIC_DEFINITION for each word is the best definition for the word. Modify these as needed.

Logeion is also a great place to find/copy definitions (but be thoughtful about this; make sure that you include definitions relevant to your text).

ADDING CUSTOM PRINCIPAL PARTS [Optional]: if you are adding custom principal parts for your text, be sure that the LOCAL_PRINCIPAL_PARTS for each word is the best definition for the word. Modify these as needed.

4. Submit your lemmatization sheet

When you have finished lemmatizing your text, submit it to The Bridge. We will harvest the new information in your local DICTIONARY and add your text information to The Bridge!

Lemmatization Principles

Substantives are lemmatized to the adjective TITLE unless the substantive has an independent meaning that is unintelligible from the adjective. This includes ethnonymns (e.g. ACARNANIS/A > ACARNANES/N; ROMANVS/A > ROMANI/N), even if the adjective form is unattested.
Participles (including perfect passiave) are lemmatized to the verb TITLE unless the participle has an independent meaning that is unintelligible from the verb or if the verb is not extant independently
Cardinal & ordinal numbers on cardinal TITLE for numbers greater than 3.
For compound numbers, lemmatize each component, e.g. 72 = two words SEPTVAGINTA and DVO
Collatoral forms (e.g. different spellings, deponent forms with the same meaning) are lemmatized to main TITLE where possible
Abstractions (e.g. "Luxury") are not generally lemmatized as a separate proper name TITLE but the general TITLE for the noun.

Morphological Categories for Latin Data

Noun
1. First Declension
2. Second Declension
3. Third Declension
4. Fourth Declension
5. Fifth Declension
6. Irregular or Indeclinable
Adjective
1. 1st/2nd Declension
3. Third Declension
Number
Pronoun
Verb
1. First Conjugation
2. Second Conjugation
3. Third Conjugation and Third Conjugation -io
4. Fourth Conjugation
5. Irregular
Adverb
Preposition
Conjugation
Interjection
Idioms
Prefixes & Suffixes
Abbreviations

Outstanding Editorial Questions/Data inconsistencies

Provide full principal parts (amo amare amavi amatus) or abbreviated (amo -are -avi -atus)?
What to do about fourth principle parts of verbs: -us or -um?
Should non-idiomatic but common entries like those for salve or vale or XAIPE receive their own entry or be combined into main entry for verb? With salve, etc. mentioned in definition?