This database is searchable as CSVs. The quickest way to get an addition or correction merged is to fork the repo or clone locally, edit the CSV file and send in a pull request.
The focus is on antedating and amending entries where the lemma is still current in everyday usage. Triage of items compiled in Yang's bibliography is underway. Editorial history and argumentation can be accessed via git blame.
This database is designed to be consulted alongside Huang and Han yü ta tz'u tien. A solidus ('/') indicates that the infomation is the same as in (1) the previous entry, (2) Huang or (3) Han yü ta tz'u tien.
Lemma
To facilitate cross-checking, the arrangement of lemmas is that of Mair et al. (2003) with the following exceptions:
words whose head characters share the same sequence of letters and tone are sorted by subsequent characters;
in case that fails, they are sorted by the number of strokes of the characters concerned.
The ideographs used are their common form as contained in the character set of the system font on modern computers that is coded as TC, which tends to be the only form attested throughout the period covered here.
Numerals following a lemma refer to the different senses of a homonymous word. They form a superset of the senses defined in Huang.
Phoneticloanwords and anatomical terms are collected in separate appendices. For loanwords, see On Diglossia.
Glosses serve to disambiguate and are set in roman type.
Domain classification is set in italic type.
Assignment to Word Class follows the analysis of Huang–Shi (2016) as corrected in the Grammatical Appendix.
Year
The first entry for a lemma represents the first known attestation. When a date is given, it is generally earlier than the earliest quotation in Huang except in the case of postdating.
When the publication date and composition date of a source differ, the dating styles of the Middle English Dictionary and OED3 are adopted.
When a source has been added from the documentation of Han yü ta tz'u tien, only the political period is available. Such fuzzy dating will gradually be replaced by more precise dates.
Quotation
To make full-text search possible, the ideographs used are those contained in the character set of the system font on modern computers that are closest to a diplomatic transcription of the source.
The typography of the source is reproduced to the extent that the resources of HTML allow.
A blank means the scholar who antedated the word didn't supply the evidence in their writings.
A question mark ('?') means the word has not been found in the source cited by the scholar who antedated the word.
For traditional critical symbols, see West (1973).
Source
Dictionary evidence is treated as a primary rather than secondary source, and represents one single attestation instead of a statement about contemporary usage.
When the earliest source for a lemma is written by a non-native speaker, a second quotation from the earliest native source is added if one does not exist in Huang.
A plus sign ('+') following a source means the word is also attested in at least one other source dating from the same year.
A question mark ('?') means the scholar who antedated the word didn't supply the source in their writings.
A question mark ('?') following a source means the scholar who antedated the word didn't clearly specify the source in their writings and that the one given here was inferred from their bibliography.
If you are in a place with internet restrictions, contact your local authorities and ask them to unblock the site for you. In the meantime, you can send the contributions to 67616464787179717a6054607b7a7f617a6760787166397b7a39607c713976617a703a777b79.