To make use of dynamic attributes they have to be set up in corpus config file .

DYNAMIC feature requires to be set to a of some internal function or to a name of function from external shared library.

DYNLIB has to be set to “internal” or to the name of shared library accordingly.

Internal functions

- striplastn        (str,n) - returns str striped from last n characters
- lowercase   (str, locale) - returns str in lowercase (for any single-byte encoding and the corresponding locale)
- utf8lowercase       (str) - returns str in lowercase (for any utf-8 encoded string str)
- utf8uppercase       (str) - returns str in uppercase (for any utf-8 encoded string str)
- utf8capital         (str) - returns str with first character capitalized (for any utf-8 encoded string str)
- getfirstn        (str, n) - returns first n characters of str
- getlastn         (str, n) - returns last n characters of str (for any single-byte encoding)
- utf8getlastn     (str, n) - returns last n characters of str (for any utf-8 encoded string)
- getfirstbysep    (str, c) - returns prefix of str up to the character c (excluding)
- getnbysep        (str, c) - returns n-th component of str according to the delimiter c (excluding)
- getnchar         (str, n) - returns n-th character of str
- getnextchars (str,attr,n) - returns n characters after attr(character)
- getnextchar   (str, attr) - returns the character after attr(character)
- url2domain       (str, n) - returns n-th component of the URL (0 = web domain, 1 = top level domain, 2 = second level domain)
- ascii  (str, enc, locale) - returns ASCII transliteration of the string according to the given encoding and locale
ATTRIBUTE   lemma {
          DYNAMIC    striplastn
          DYNLIB     internal
          ARG1       "2"
          FUNTYPE    i
          FROMATTR   lempos
          DYNTYPE       index
}
ATTRIBUTE   lc {
          DYNAMIC    lowercase
          DYNLIB     internal
          ARG1       "C"
          FUNTYPE    s
          FROMATTR   word
          DYNTYPE    index
          TRANSQUERY yes
}
ATTRIBUTE   tag {
         DYNAMIC     getfirstn
         DYNLIB      internal
         ARG1        "3"
         FUNTYPE     i
         FROMATTR    ambtag
         DYNTYPE     index
}
ATTRIBUTE   k {
         DYNAMIC     getnchar
         DYNLIB      internal
         ARG1        1
         FUNTYPE     i
         FROMATTR    tag
         DYNTYPE     index
}
ATTRIBUTE   g {
         DYNAMIC     getnextchar
         DYNLIB      internal
         ARG1        "g"
         FUNTYPE     c
         FROMATTR    tag
         DYNTYPE     index
}
ATTRIBUTE   g3 {
         DYNAMIC     getnextchar
         DYNLIB      internal
         ARG1        "g"
         ARG2        3
         FUNTYPE     ci
         FROMATTR    tag
         DYNTYPE     index
}

Shared library

A shared library function must return const char*.

The following example function takes the year of publishing of the document and determines the epoch from which the document comes.

  • the source code (epoch.c):
    #include <stdio.h>
    
    const char * epoch (char* year)
    {
           int y;
           sscanf(year, "%d",&y);
           if(y<1990) return ("before 1990");
           if(y<2001) return ("1990-2000");
           if(y<2005) return ("2001-2004");
           if(y<2009) return ("2005-2008");
           return ("2009 and later");
    }
    
  • to compile the library use:
    gcc -Wall -fPIC -DPIC -shared -o epoch.so epoch.c
    
  • the important part from the corpus configuration file:
    STRUCTURE doc {
        ATTRIBUTE year
        ATTRIBUTE time {
            DYNAMIC    epoch
            DYNLIB     "/corpora/vert/greek/epoch.so"
            FUNTYPE    0
            FROMATTR   year
            DYNTYPE       index
            TRANSQUERY yes
        }
    }
    

The dynamic attribute is created when compiling the corpus using encodevert. If you want to setup additional dynamic attributes, it is not necessary to recompile the whole corpus. Just adjust the configuration file and create the dynamic attributes using mkdynattr:

mkdynattr <corpus> <dynattr>
mkdynattr gkwac0.5 doc.time