WebDAV for TclHttpd

This is a simple WebDAV-extension to let TclHttpd work as a WebDAV-server.

News

  • August 04 – WebDAV for TclHttpd is program of the week (POTW). Thanks a lot!
  • July 04 – Released version 0.1a

Introduction

This is a simple WebDAV-extension which could be used in conjunction with the TclHttpd-Webserver to built a WebDAV-Server.

You have embedded TclHttpd in your application to provide web-access? Why not additionally add WebDAV-support as well?

Because of this being considered as a prototype you should be warned (this is alpha-status).

Features

  • WebDAV-Support for files in the webserver-docroot
  • WebDAV-Support for vfs-mountable files (such as starkit-, zip- or tar-files)
  • Quite easy to extend
  • Is tested with:

So what is that? With the provided tcl-files (see Download) you put into your custom-directory you can turn your TclHttpd-Webserver into a WebDAV-Server. You can e.g. expose vfs-files to be accessed via WebDAV. Suppose you have a zip-file. Put that in your docroot and you can browse through the zip-file with a WebDAV-Client (like Internet Explorer or Konquerer).

Maybe you have some starkits, simply connect a Web folder with your WebDAV-server and browser through the contents of the starkit (see image below).

Though this implementation is quite useful you should be aware of several

Drawbacks

  • No support for locking
  • No support for authentification
  • No support for versioning (so it’s kind of WebDA ;-)
  • No threading-support (at least: I haven’t tested it)
  • Testing is very incomplete (COPY of collections??)
  • depth-header is mostly ignored
  • many empty spaces in the implementation

Keep in mind, that this was just a fun-project for me and no serious WebDAV-implementation. One more drawback is the mapping of path-prefixes to the Url-Prefix-Handler. Maybe I will change that later.

Preconditions

WebDAV for TclHttpd needs the following modules:

If you get the latest Tcl-Distribution everything should be in place (besides TclHttpd).

Installation

To make your TclHttpd WebDAV-aware simply follow these steps:

  • Download webdav_01a.zip
  • Extract the zip-file into the ‘custom’-directory of your TclHttpd-installation
  • Create a directory in the ‘htdocs’-directory that should provide WebDAV-access (e.g. dav)
  • put some files in that directory
  • Create another directory (let’s say kit) in which you may place starkit-, zip- or tar-files (be sure to use the extensions .kit, .tar or .zip resp. in all other cases the files will not be recognized (see webdav_vfs.tcl).
  • Adapt webdav.conf to your needs (if you follow this example and use the given directory-names, everything should work with the provided webdav.conf

That’s it.

Now you can use your Internet Explorer and create/open a Web folder with this Url:

 http://your-server:your-port/kit/  (or /dav/)

If you have a standard installation of TclHttpd use: http://localhost:8015/dav/

Remember to check “Open as webfolder”.
Depending on the content in that directory you could see something like this:

webdav

Screenshot of WebDAV, seen from Windows Explorer

This image shows a mapped Web folder in the usual Windows-Explorer. Currently I just browsed inside the
file-structure of patience.kit (an implementation of the patience-card-game which I just had lying around, you can download the starkit from [1]).

Configuration

To configure WebDAV have a look at webdav.conf. The content is explained a little bit more detailled in the next chapter.

Extend WebDAV

Maybe you even want to extend WebDAV for other purposes. Simply create a file like webdav_mymodule.tcl inside the TclHttpd-custom-directory. Add the following lines to webdav.conf.

webdav_resource ''/your-path'' {
  filename custom/webdav_mymodule.tcl
  namespace my::webdav::module
}

When TclHttpd starts it reads the custom-directory-files. Sourcing webdav.tcl results in evaluating webdav.conf. All resources listed there would be read in as well.
The function webdav_resource takes 2 parameters. The first one is the absolute path in your htdocs (this is used to install the handler with Url_PrefixInstall) The second parameter contains a list of key-value-pairs. filename and namespace must be given to let webdav automatically source in the files and call the corresponding functions. More key-value-pairs can be entered and could be used by your own module.

Your module has to support a few functions (to get a working WebDAV-functionality). These are:

  • your_namespace::GET
  • your_namespace::PROPFIND

Furthermore you can implement all functions which are needed by WebDAV according to rfc2518.

Download

Examples

You want to give it a try but don’t have a WebDAV-client? Why not use webdav_vfs from tclvfs?

Assume you have put a starkit.kit inside htdocs/kit. Open a tcl-shell and then mount the resource (in this example an url to the webdav_vfs-handler):

package require vfs
vfs::webdav::Mount http://localhost:8015/kit/ kit
cd kit
glob *  ;# yields a list of file-names in the kit-directory
# assume you have a starkit there (e.g. patience.kit, see image above)
cd patience.kit
# you're now inside the starkit-file
glob *  ;# return a list of files/directories inside the starkit

You can do so with zip- or tar-files as well. Isn’t that crazy? You access
zip-files via WebDAV. I’m quite astonished about this :-) and I’m having a lot of fun with it.

Be warned. The webdav-client implementation needs at least as much work as this webdav-server implementation :-)

History

29. Jul. 2004

Released version 0.1a (damn IE is such a nit-pick DAV-client). Removed property lastaccessed!

28. Jul. 2004

Released initial version (0.1)

References

Copyright/License

This software is copyright © 2004-2010 by Stefan Vogel.

This software is released under GPL.

Tgdbm library for Tcl (Version 0.5)

Tgdbm is an easy-to-use Tcl-wrapper to the GNU-dbm-library (gdbm).

Overview

Tgdbm provides an easy to understand interface to the
GNU-dbm-library (gdbm).
Gdbm uses extendible hashing and stores key/value pairs, where each key must be unique (Gdbm can be downloaded at the GNU-Website, there
is also my windows-port of gdbm).
Though gdbm provides compatibility for ndbm and dbm only the gdbm-commands are supported in Tgdbm.

Furthermore you can use Tgdbm for transparently accessing and storing tcl-arrays (persistant arrays). An array is attached to the gdbm-file as a handle. With this you can set an array-entry which is stored or updated transparently in the corresponding gdbm-file.

Tgdbm is meant to be used in Tcl-Applications which has to store some small/medium amount of data. In many cases there is not enough data to be stored, so that a “real” database is justified.
But though there is only a small amount of data, you often have to use an efficient way to store them (not just write them in a plain text-file).

Because Tgdbm is provided as a loadable Tcl-Module, it can be easily
integrated into any Tcl-Application.

Download

You can download Tgdbm with the following links:

History

14. April 2005

Released Version 0.5:

Yes it’s still 0.5 but has some improvements. All those fixes were sent to me from Thomas Maeder (thanks a lot). Have a look at the file CHANGES.txt inside the distribution to see what happened.

9. Jan. 2004

Released Version 0.5:
Persistant arrays were added to Tgdbm. Because of a nearly equivalent concept for tcl-arrays (which have unique keys) and gdbm-key-value pairs which also have unique keys, these are now combined to have a transparent handling of persistant arrays.
You can simply attach an array-name to a gdbm-file. Afterwards every operation on the array (read/write/unset) is traced and the key/values are automatically fetched/stored or updated/deleted in/from the gdbm-file.
For further information see README.txt.

Cleanup and restructuring of the C-Code, added sync-command …

1. Feb. 2000

Released Version 0.4

A quick and simple example

Even though the Tgdbm-commands should be easy enough (if you know the gdbm-library) a few examples should help to start immediately.

package require tgdbm
proc store_array {file data} {
    upvar $data dat
    # create file if it doesn't exist
    set gdbm [gdbm_open -wrcreat $file]
    foreach entry [array names dat] {
        $gdbm -replace store $entry $dat($entry)
    }
    $gdbm close
}

You can also try the file tests/demo.tcl which implements a simple gdbm-file-viewer. This viewer stores it’s configuration-options (like colors or window-positions) in option.gdbm (like an INI-file).
Gdbm-viewer needs the tablelist-widget from Dr. Casa Nemethi (which can be obtained from: http://www.nemethi.de).

More examples.

Documentation

More documentation

Holzmichel II

Scheinbar lebt der alte Holzmichel doch nicht so richtig. Wurde noch über die Mailingliste  bzw. Golem angekündigt, dass die Version 22 des Betriebssystem Emacs noch Ende April erscheinen soll, so findet sich nun – fast Mitte Mai – keinerlei Hinweis auf irgendein Release. Außer vielleicht der Hinweis von Richard Stallmann selbst, dass ”Montag” ein Zeitpunkt für den 22er Branch wäre. Nicht, dass das Erstellen eines 22er Branch ein Hoffnungsschimmer wäre. Noch beunruhigender kommt mir der Termin vor: Montag!? Uh-uh.

Hmmm, vielleicht in der Download-Liste? Nix … alles 21. Auch auf der Emacs-Homepage ist die letzte stable version als 21 deklariert (immerhin vom 6. Februar 2005).
Ehrlicherweise sollte man erwähnen, dass es ja im CVS immerhin den 22er Branch gibt, aber weiterführende Ankündigungen oder sonstiger Hype bleiben völlig aus.

Ja, bin ich denn der Einzige, der sich schon mit dem Schlafsack vor die  Emacs-Seite gelegt hat, um rechtzeitig da zu sein wenn die Türen für die 22er Version aufgehen?

Trackback

Lange Zeit war mir das ja ein Rätsel: Was ist ein Trackback?

Das lag wahrscheinlich daran, dass ich kein echtes Blog hatte. Nun mit einem echten Blog (WordPress) habe ich da in der Artikeleingabemaske eine Eingabezeile für eine Trackback-Url. Was tun?

Und da für manche der überaus hilfreiche Artikel zu Trackbacks noch zu schwere Kost ist, erkläre ich es hier mal kurz und knapp in meinen eigenen Worten:

Ich habe neulich die News bei Golem gelesen, insbesondere bin ich über den Artikel GNU Emacs 22 noch im April 2007 gestolpert.

Der Artikel hat mich – da ich ja leidenschaftlicher Emacs-Fan bin – zu einem eigenen Eintrag in meinem Blog inspiriert.
Da ich ja nun die News von Golem hatte und auch in meinem Artikel auf die Golem-News verlinke und vielleicht Leser von Golem auch gerne meinen Beitrag lesen wollen (wovon ich natürlich gaaanz stark ausgehe), habe ich im Trackback-Feld in meiner WordPress-Eingabe-Maske die Trackback-Url von Golem angegeben:
http://www.golem.de/trackback/51600 (die erhält man durch Rechts-Klick auf den Trackback-Link und “Verknüpfung kopieren”).

Dann habe ich meinen Artikel veröffentlicht und – bums – da schau: auf dem Golem-Artikel steht mein Blog-Eintrag unter Trackbacks. Sehr praktisch.
Das ist wie fremden Content kommentieren, nur im eigenen Blog. Und in das Niveau der Kommentare zum besagten Golem-Artikel wollte ich mich nicht einreihen.

So einfach ist das. Da ich im Übrigen nicht von jedem Spammer dieser Welt tausende von unsinnigen Trackbacks haben möchte, sind bei mir die Kommentare moderiert. Und siehe da: Trackbacks zu meinen Blog-Einträgen landen in der Kommentar-Warteschlange und ich kann die gefälligen einfach freischalten.

Wem das jetzt zu kurz war, der soll sich doch bitte bei den üblichen Quellen bedienen:

Der alte Holzmichel lebt noch

GnuNeulich fragte ich mich noch, wie lange ich wohl auf meiner alten Emacs 21er Version sitzen soll. Schließlich wird man fast täglich von jungen reizvollen Dingern angebaggert.

Und jedesmal frage ich mich: Soll ich diesem Reiz nachgehen? Wie lange hält meine Alte (Editorin :-)) denn noch durch. Geliebäugelt habe ich ja schon hin und wieder mit einer Neuen. Doch alle die, die nun wieder mit dem Spruch “Eight megabyte and continously swapping” kommen (aus der Geburtsstunde des Emacs als man noch mit 640 KB Memory auskommen musste), und deshalb lieber z.B. Eclipse benutzen, haben ja nun gar kein Argumentationsrecht. Also, bin ich bis jetzt doch immer noch treu auf meinem Emacs sitzen geblieben. Nur die Zeit sprach langsam gegen ihn. Seit 2001 keine sichtbare Entwicklung … das tut selbst dem besten Editor dieser Welt nicht gut.

Aber Leute, für alle die den Emacs benutzen, weil er ein Editor for middle-aged computer-scientists ist, gibt es jetzt Entwarnung:

Weiterlesen

Alexa Traffic Rank

Jetzt bin ich aber überrascht.

Die letzten Monate habe ich immer mal wieder bei Alexa vorbeigeschaut, um zu sehen welchen Traffic-Rank www.vogel-nest.de hat. Der Traffic-Rank lag natürlich immer jenseits von Gut und Böse. Ist ja auch kein Wunder, schließlich läßt meine Blog-Regelmäßigkeit doch sehr zu wünschen übrig.

Besonders schade fand ich bei Alexa immer die Meldung in der Grafik: “Not in top 100,000″. Aber ohne Fleiß kein Preis und bei den “Top 100.000″-Sites kann ich mangels Besucherstrom auch nicht mithalten.

Umso erstaunter bin ich nun als ich heute folgendes sehe: Weiterlesen

WebImageSnap für jQuery

Mit ”WebImageSnap” lassen sich Bilder auf Webseiten als Links einbinden. Als
kleines Beispiel dient
Wilhelm Busch (durch Mouseover öffnet sich das Bild).

Das Prinzip ist ähnlich wie bei Snap Preview Anywhere (oder WebSnapr) allerdings wird bei ”WebImageSnap” nur das verlinkte Bild dynamisch per Ajax dargestellt.

Hierdurch ergeben sich folgende Vorteile:

  • fairerweise kann man, da ja nur ein Link, die Quelle des Bildes erkennen (erscheint wie üblich unten in der Statuszeile)
  • durch Anklicken des Links landet man direkt auf der referenzierten Site
  • der gefürchtete bandwidth theft bleibt fast aus, da das Bild nur geladen wird, wenn der Leser (oder besser ”Schauer”) es explizit sehen will (und nicht bei jedem Pageload)

Die Grundidee dahinter steht in meine Blog.

Grundlage

WebImageSnap ist nur eine kleine Erweiterung der jquery-Bibliothek. Es wird beim MouseOver-Event der angegebene Link als Bild geholt und quasi als Tooltip eingeblendet.

Installation

Dies ist doch ein Tutorial. Du musst dich schon ein wenig mühen und die Sache bei dir selbst einbauen!

So geht’s

Zuerst legen wir mal den Style für das Popup fest. Hierzu nimmst du dein Standard-Stylesheet (in WordPress z.B.style.css) und fügst folgenden Style für den Tooltip hinzu:

.ttimagesnap_style {
  text-align: center;
  font: 10px Arial,Helvetica,sans-serif;
  border: solid 1px #666666;
  background-color: #ffffff;
  padding: 1px;
  position: absolute;
  z-index: 100;
}

Du kannst den Style natürlich deinen Wünschen und deiner Site entsprechend anpassen.

Dann musst du natürlich (wenn nicht schon geschehen)
jquery-Download als “jquery.js” auf deinen Server hochladen, an zentraler Stelle einbinden und noch ein bisschen Javascript-Code hinzufügen.

Bei WordPress z.B. im Theme deiner Wahl in header.php (Bei WordPress brauchst du jQuery selbst nicht mehr einbinden – das gibt es schon frei Haus dazu).
Weiterlesen

Web Scraping

Part I

As I was searching through the web to find something useful concerning “web scraping”, I was astonished about the lack of information. So I decided to put up something myself. Isn’t there anything useful out there? I know “web scraping” (or “screen scraping” in general) is a disgusting technique and I have to admit: it usually makes me puke.

But, well, there are times, when you have no other chance (or even worse: you have a chance but that one is even more horrible).

After doing several web scraping-projects I will put together some of the experience. The following examples will be shown in PHP and Tcl (version > 8.4.2 and tdom 0.8). But as far as I know, other languages could easily used with similar techniques (Ruby for example).

But first of all a …

WARNING

Before starting to scrape something off the web, be sure there is no better way. Often you may find an official API that should be used (e.g through Web Services or a REST-API) or there are other services that deliver the needed information.

And moreover convince yourself that web scraping is at least not forbidden. Some big sites state in their terms and conditions that scraping is not allowed. You should respect that. And furthermore be aware that your requests add to the load of the target site. Always keep in mind, that you are retrieving information in a way that’s surely not intended by the sites-owners and -operators. So be nice and don’t make too much requests.

If you’re taking content from other sites without the permission of the creators you will, depending on the usage of this content, violate copyright law.

Having said that, we start with the simplest method.

Regular expressions

That’s always the first method mentioned, when somebody speaks of analyzing texts (and “analyzing text” is in general what you do when you scrape a website). Though this might be feasible for grabbing specialized texts from a page, you get in hell if you want more.

So let’s look at a small example where a regular expression is enough. We want to extract the current value of the DAX.
There is certainly some webservice to retrieve this kind of data. But as I wanted to make a really simple example, let’s assume there is no way around scraping.

Have a look at any financial-site and you will find some HTML similar to that:

  ...
  DAX
  5.560,13
  ...
HTML-Code 1

We are concentration our attention to the table with the row “DAX” and the column “Punkte”.
To extract the DAX-value, this could be done simply by

DAX.*?(.*?)/s';
if (preg_match_all($regexp, $html, $hit) && count($hit[1]) == 1) {
    print 'Found DAX: '.$hit[1][0];
} else {
    print 'Error! Retrieved '.count($hit[1]).' matches!';
}
?>
PHP-Code 1

Or if you prefer to write that in Tcl:

set f [open boerse.html r]; set html [read $f]; close $f
// or
package require http
set token [::http::geturl "http://boerse.ftd.de/ftd/kurse_listen.htm"]
set html [::http::data $token]
set regexp ">DAX.*?(.*?)"
// -all -inline counts complete match and braced expression
if {[set l [regexp -all -inline $regexp $html]] && [llength $l] == 2} {
    puts "Found DAX: [lindex $l 1]"
} else {
    puts "Error! Retrieved [llength $l] matches"
}
Tcl-Code 1

To have a better way of testing, I’m usually storing the page locally. With file_get_contents you can simply switch from the local stored file to the web-address (as far as I know there is nothing that easy in Tcl to switch between file and url). As long as you’re trying to find out the correct regular-expression for the match, you should definitely do that with a locally stored HTML-file.

Make sure that this pattern only matches once or you might retrieve the wrong part of the page. To do so, the regular expression pattern contains a little bit of the surrounding tags. Assuming that there will only be one linked text “DAX” in a table-cell, with the next cell containing a number.

Further in PHP add the modifier /s (treat string as single-line) to the regular expression (or in Tcl the switch -inline). Because the text to match stretches multiple lines (see “HTML-Code 1″) and I simply wanted to ignore that.

Because of unexpected and surely unannounced changes to the page (at least unannounced to you as an ”nearly” anonymous scraper), make sure that you check for the right data. If the pattern doesn’t match, there is definitely something wrong and you have to look at the HTML-Code for changes. Or maybe the pattern matches more than once, this should be wrong, too. Therefore I’m always using preg_match_all (or in Tcl -all).

Well, this was easy and in fact I wouldn’t call this “web scraping”. If you want more to scrape than a single number or word from a page, forget about regular expressions.

We need something more powerful. Something which can be used on nested structures. Have you ever tried to match paired
"

...

" with regular expressions? No way! Go directly to jail! Do not pass go!

Part II

A more powerful way than regular expressions? Nearly imaginable? Small mind!

DOM

DOM is for correctly structured XML-like data only? Oh no. There is more. At least in PHP you can use the usual DOMDocument. And as far as I know even “Internet Explorer” somehow handles badly formatted HTML. And it is using a
DOM-representation internaly. So there are other “convert bad-bad-bad html to dom”-tools out there.

Let’s start with another simple example. We want to find out how long a search on google takes.

First we have to feed the HTML into the DOMDocument (let’s search for “scraping”). To get the url just go to the website, enter “scraping” and copy the resulting url to the code.

$url='http://www.google.de/#q=scraping';
$html=file_get_contents($url);
// create DOM from bad HTML
$dom = DOMDocument::loadHTML($html);
if ($dom) {
    // go on with parsing
}
PHP-Code 2
package require tdom
package require http
set url "http://www.google.de/#q=scraping"
set token [::http::geturl $url]
set html [::http::data $token]
# create DOM from bad HTML
if {![catch {dom parse -html $html} dom]} {
    set root [$dom documentElement]
    # go on with parsing
}
Tcl-Code 2

You will get tons of warnings from the method loadHTML. As we know that this is badly formatted HTML, we will silently ignore those.

If we got a dom-object we’re starting to parse the HTML. We’re doing this with XQuery. After analyzing the HTML-code of the result-page you can find this specific text (newlines inserted for clearness):

HTML-Code 2

Search for the duration of the search, we simply have to get the div-tag with id resultStats. And below that the nobr-tag.

$xpath = new domXPath($dom);
// get the div-tag with id=resultStats
$queryTime   = '//div[@id='resultStats']/nobr';
$nodeTimeList = $xpath->query($queryTime);
if ($nodeTimeList && $nodeTimeList->length == 1) {
    print 'Query took: '.$nodeTimeList->item(0)->nodeValue;
    // further queries ... see below
} else {
    // something went wrong, do some error-management
}
PHP-Code 3

In Tcl this looks like this:

if {![catch {$root selectNodes {//div[@id='resultStats']/nobr}} nodeTimeList]
    && [llength $nodeTimeList] == 1} {
    puts "Query took: [[$nodeTimeList firstChild] nodeValue]"
    # further queries ... see below
} else {
    # something went wrong, do errorhandling
}
Tcl-Code 3

With the XQuery //div[@id='resultStats']/nobr we get all the nobr-tags that are below a div-tag with the id-attribute resultStats.

And because it is an id it really should be only one. But you never know. The search might give no results. In that case we wouldn’t get a node-list-object, so we check for the existance and that there is exactly one element ($nodeTimeList->length == 1). You should always completely check your results that they exactly meet your expectations.

If the search doesn’t return results you should think of some error-handling.

You will ask yourself: “Why haven’t we used the method getElementById?” This would return the node directly. But have a close look to this method. As mentioned in the
documentation, you have to call validate() before. You won’t expect that HTML-rubbish could be validated, do you?

Now let’s print the search results.

Looking through the html-code we find (newlines inserted for clearness):

Ergebnisse

By now we would come to complex parsing with regular expressions, with XQuery we simple ask for this nodes:
//div[@id='results']//h3. The script would look like this:

$nodeHitList = $xpath->query("//div[@id='results']//h3/a");
foreach ($nodeHitList as $node) {
    print $node->nodeValue;
}
foreach node [$root selectNodes {//div[@id='results']//h3/a}] {
    puts [$node asText]
}

Could it be shorter and cleaner? I guess no. Maybe we could again add some error-checking? I will leave this as an excercise to you. ;-)

Some word about User-agent

The way I retrieve the pages in the example is surely most simple. When using file_get_contents PHP doesn’t send a useragent-string within the request. Retrieving the url in Tcl with geturl sends the useragent “Tcl http client package “. In Tcl you can simply configure another useragent with

::http::config -useragent "lala"

In PHP you have to use a full-blown http-reader like HTTP_Request if you want to do more fancy things like setting the useragent or retrieving the pages through a proxy.

Setting the useragent might be necessary because of the target-page checking against the used browser and retrieving the page as “tcl client” might not be the most used “browser” :-).

But as stated in the warning at the beginning, you should be honest and friendly toward the scraped site and identifying yourself as a “scraper” is one way to do that.

Conclusions

If I’ve got some time I will add some chapters concerning sessions (e.g if you like to get your bank-balance automatically) and ssl and maybe even some warnings about javascript.

But for the time being I leave it as is. Unless someone wants to improve this pigeon-english (I’m always glad if someone corrects me, please don’t hesitate to mail me all errors).

References

As said in the beginning, there is not much information around for this subject.

Professional screen-scraping software:

Tgdbm Library for Tcl (Version 0.5)

This is the documentation of the functions provided by the Gdbm-Tcl-Wrapper.

Overview

When opening a gdbm-database-file a handle for this database is provided for
accessing the file. This handle is used to call the gdbm-commands.

Furthermore you can use Tgdbm for transparently accessing and storing tcl-
arrays (persistant arrays). An array is attached to the gdbm-file as a handle.
With this you can set an array-entry which is stored or updated transparently
in the corresponding gdbm-file.

Tgdbm is meant to be used in Tcl-Applications which has to store some
small/medium amount of data. In many cases there is not enough data to be
stored, so that a “real” database is justified.

But though there is only a small amount of data, you often have to use an
efficient way to store them (not just write them in a plain text-file).

Because Tgdbm is provided as a loadable Tcl-Module, it can be easily
integrated into any Tcl-Application.

For information about downloads, … see the Tgdbm-overview.

Commands

Let’s have a look at a simple walk-through.

  package require tgdbm
  set gdbm_handle [gdbm_open -wrcreat -newdb first.gdbm]
  $gdbm_handle -insert store key1 {Some value to be stored.}
  set value [$gdbm_handle fetch key1]
  puts "value: $value"
  $gdbm_handle close

That’s nearly all there is. When opening/creating a gdbm-database-file a handle
is returned, which is used as a Tcl-command for accessin this
database-file. When calling this command you provide the “usual”
gdbm-commands (like gdbm_store, …) but without the gdbm-prefix
as a parameter.

Now let’s use the persistant array feature:

  package require tgdbm
  gdbm_open -wrcreat -newdb -array my_array second.gdbm
  set my_array(1) one  ;# this will store 1/one in my_array and directly
                       ;# in file "second.gdbm"
  set my_array(2) two
  unset my_array(2)    ;# this will delete key '2' from second.gdbm
  set my_array(1) eins ;# this will update key '1' to 'eins'
  my_array -replace store 2 zwei
  set x $my_array(2)   ;# x == zwei
  unset my_array       ;# this will close the file second.gdbm
                       ;# this could be done with:
                       ;#  my_array close
                       ;#  too, which will also unset the array

The following commands are available which are directly mapped from
aequivalent gdbm-commands:

Furthermore there are some useful additional commands:

gdbm_open

Syntax:gdbm_open [option] file
Options:-reader (default)

-writer

-wrcreat

-newdb

-nolock

-sync

-block_size block_size_number

-mode file-mode

-fatal_func function_name

-array array_name

-full_cache

-no_trailing_null

Gdbm-Command:gdbm_open
Return-Value:Handle (and Tcl-command) to database-file file or list of currently opened handles.
Description:If no option is provided to gdbm_open, it will simply return all currently opened gdbm_handles as a list.

The specified file is opened either for reading or writing. Gdbm sets a lock on this file so there is only one writer. With -wrcreat the file is created if it is not existent. -newdb creates file regardless if one exists.

With -mode the file mode may be specified.

In case of a fatal-error a Tcl-callback-function may be specified with -fatal_func. This Tcl-callback will be called if something inside Tgdbm crashes.

The fatal-error-function my_callback_fct must be defined as this:

 proc my_callback_fct {error_message} {
     ...
 }

Example:

gdbm_open -writer -newdb -nolock -fatal_func my_callback_fct help.gdbm

Opens/creates a new gdbm-database-file help.gdbm and doesn’t lock the file even though it will write to this file (should really be used with care).

Array-Handling (Version 0.5)

With -array_name an array named array_name is attached to the gdbm-file. When given an array_name this is also the returned gdbm-handle.

gdbm_open -writer -array_name my_array test.gdbm

returns my_array. When the name of the given (global) array is already attached to a gdbm-file an error is thrown.

If -array_name is given the caching-mechanism can be specified with -full_cache, which means that the whole gdbm-file is stored in the given array directly (be careful, should be used with small files anyway). When -full_cache is not specified, the gdbm-file is read as needed.

An array could be attached after opening a file with:

gdbm_handle attach_array (then the name of the array is the same as the handle).

Warning:

Due to the problem that one cannot determine if a variable is global or local to a current procedure (you can use Tcl_FindNamespaceVar but than you cannot compile the tgdbm-library with stubs-enabled, because this function is not in the stub-enabled-library). If you want to use the array attached to a gdbm-file in a procedure you have to make this a global array (else you access a procedure local-array, which is not the same!).

Here is an example:

  gdbm_open -writer -array ini ini.gdbm
  set ini(font) Times
  proc setSize {size} {
    # wrong!!!: set ini(size) $size
    global ini
    set ini(size) $size
  }

End-of-string-Handling (Version 0.5)

-no_trailing_null Usually all strings either stored or returned have the usual C-style-string-ending (null). In some cases you don’t want to store this trailing null or you have gdbm-files which have the keys and values stored without trailing null. In such rare cases you could use argument -no_trailing_null. Tgdbm will not store end-of-string within the gdbm-file.

close

Syntax:gdbm_handle close
Options:none
Gdbm-Command:gdbm_close
Return-Value:none
Description:Close the database-file which is associated with gdbm_handle. Where gdbm_handle is retrieved with a call to gdbm_open

store

Syntax:gdbm_handle [option] store key value
Options:-insert -replace
Gdbm-Command:gdbm_store
Return-Value:none
Description:The given value is stored in the database-file with the given key. If -insert is specified and the provided key is already stored in the database an error is thrown (with error-code GDBM_ILLEGAL_DATA).

fetch

Syntax:gdbm_handle [option] fetch key
Options:none
Gdbm-Command:gdbm_fetch
Return-Value:The value associated with key in the database.
Description:Fetch the key/value-pair from the database.

exists

Syntax:gdbm_handle exists key
Options:none
Gdbm-Command:gdbm_exists
Return-Value:0 or 1
Description:If the key does exists, 1 is returned. Otherwise 0.

delete

Syntax:gdbm_handle delete key
Options:none
Gdbm-Command:gdbm_delete
Return-Value:none
Description:Delete the given key from databasefile. If the key does not exist an error is thrown.

firstkey, nextkey

Syntax:gdbm_handle firstkey
gdbm_handle firstkey key
Options:none
Gdbm-Command:gdbm_firstkey, gdbm_nextkey
Return-Value:key
Description:These commands are used for iterating through the database-file. You can use it like this:

  set gdbm [gdbm_open -reader file.gdbm]
  if {[set key [$gdbm firstkey]] != ""} {
      puts "key: '$key' value: '[$gdbm fetch $key]'"
      while {[set key [$gdbm nextkey $key]] != ""} {
          puts "key: '$key' value: '[$gdbm fetch $key]'"
      }
  }
  $gdbm close

reorganize

Syntax:gdbm_handle reorganize
Options:none
Gdbm-Command:gdbm_reorganize
Return-Value:none
Description:When you have done many deletes on a database-file, the space is not freed until you call reorganize.

sync

Syntax:gdbm_handle sync
Options:none
Gdbm-Command:gdbm_sync
Return-Value:none
Description:Unless you have opened a gdbm-file with option -sync writes are not flushed directly to the disk.
With this function you can force a flush to the disk.

count (extension of gdbm)

Syntax:gdbm_handle count
Options:none
Gdbm-Command:none
Return-Value:Number of total rows in gdbm-file
Description:This is aequivalent to a “select count(*) from table” in a relational database.

maxkey (extension of gdbm)

Syntax:gdbm_handle maxkey
Options:none
Gdbm-Command:none
Return-Value:Number of the maximum primary-key
Description:Should be used only when the primary-key consists of integer-numbers (as always when you use an ID as the primary key). In most cases you want to insert a new element and give this element a unique ID. Use [expr [gdbm_handle maxkey] +1] for this purpose.
You could also simulate sequences that way.

attach_array (extension of gdbm)

Syntax:gdbm_handle attach_array
Options:none
Gdbm-Command:none
Return-Value:none
Description:To attach an array after opening a file you can use this command. The name of the array is the same as the name of the gdbm_handle.

detach_array (extension of gdbm)

Syntax:gdbm_handle detach_array
Options:none
Gdbm-Command:none
Return-Value:none
Description:If you don’t want the transparent storing/retrieving of data through the array you can remove the array from the gdbm_handle with this command.

keys (extension of gdbm)

Syntax:gdbm_handle keys
Options:none
Gdbm-Command:none
Return-Value:List of ALL keys from gdbm-file.
Description:This works like array keys (without the option to add a search-pattern). You should be careful to use this, if your data exceeds the medium-amount of data.

Versions and Errors

Variablename/Syntax:GDBM_VERSION, GDBM_ERRNO, gdbm_strerror error-code
Options:none
Gdbm-Command:gdbm_version, gdbm_errno, dbm_strerror
Return-Value:gdbm_version returns a version-string

	gdbm_strerror gives an error-description to an error-number||
Description:You can access the version-string provided in the variable GDBM_VERSION. In case of an error the variable GDBM_ERRNO is filled with the corresponding gdbm-error-number (see gdbm.h for detailer error-numbers).

With gdbm_strerror $GDBM_ERRNO an error-description could be retrieved. In case of most errors the error-description is thrown with the error-command.

Variablename:gdbm_error
Description:To provide a way to access the error-code-defines in gdbm (e.g.: GDBM_FILE_OPEN_ERROR, ..) the array gdbm_error is provided. With these you can check GDBM_ERRNO for specific error-codes without using the integer-values of the gdbm-error-code-defines.

The following “defines” (that is array-entries) exists:

	gdbm_error(GDBM_NO_ERROR)
	gdbm_error(GDBM_MALLOC_ERROR)
	gdbm_error(GDBM_BLOCK_SIZE_ERROR)
	gdbm_error(GDBM_FILE_OPEN_ERROR)
	gdbm_error(GDBM_FILE_WRITE_ERROR)
	gdbm_error(GDBM_FILE_SEEK_ERROR)
	gdbm_error(GDBM_FILE_READ_ERROR)
	gdbm_error(GDBM_BAD_MAGIC_NUMBER)
	gdbm_error(GDBM_EMPTY_DATABASE)
	gdbm_error(GDBM_CANT_BE_READER)
	gdbm_error(GDBM_CANT_BE_WRITER)
	gdbm_error(GDBM_READER_CANT_DELETE)
	gdbm_error(GDBM_READER_CANT_STORE)
	gdbm_error(GDBM_READER_CANT_REORGANIZE)
	gdbm_error(GDBM_UNKNOWN_UPDATE)
	gdbm_error(GDBM_ITEM_NOT_FOUND)
	gdbm_error(GDBM_REORGANIZE_FAILED)
	gdbm_error(GDBM_CANNOT_REPLACE)
	gdbm_error(GDBM_ILLEGAL_DATA)
	gdbm_error(GDBM_OPT_ALREADY_SET)
	gdbm_error(GDBM_OPT_ILLEGAL)

Example:

set gdbm [gdbm_open -wrcreat file.gdbm]
if {[catch {$gdbm fetch store key1} result]} {
    if {$GDBM_ERRNO == $gdbm_error(GDBM_ITEM_NOT_FOUND)} {
        puts stderr "Item not found."
   }
}

Examples

Even though the Tgdbm-commands should be easy enough (if you know
the gdbm-library) a few examples should help to start immediately.

Pay attention to the reduced error-handling.

1. Store a bunch of data (which is stored in the array data)

package require tgdbm
proc store_array {file data} {
    upvar $data dat
    # create file if it doesn't exist
    set gdbm [gdbm_open -wrcreat $file]
    foreach entry [array names dat] {
        $gdbm -replace store $entry $dat($entry)
    }
    $gdbm close
}
# ISBN - Booktitles
array set books {1-567xyz "XML Pocket Reference" abc "The Bible"}
store_array books.gdbm books

2. List the content of a database-file

See Description of firstkey, nextkey

3. Using the array-commands

Example with array-extension (since version 0.5).

package require tgdbm
gdbm_open -wrcreat -array books books.gdbm
# this one automagically stores the data in books.gdbm
# in the form (ISBN title)
array set books {
    1-567xyz "XML Pocket Reference"
    abc      "The Bible"
}
# now closing the file
unset books
# this could also be done with
# books close

4. gdbm-arrays and namespaces

This example shows how to use “global” gdbm-arrays inside namespaces:

package require tgdbm
namespace eval gdbm::ar {
    variable myArray
}
proc gdbm::ar::init {gdbm_file} {
	variable myArray
	gdbm_open -wrcreat -array myArray $gdbm_file
}
proc gdbm::ar::close {} {
	variable myArray
	myArray close
}
proc gdbm::ar::printData {} {
	variable myArray
    parray myArray
}
proc gdbm::ar::get {key} {
	variable myArray
	return $myArray($key)
}
proc gdbm::ar::fillData {args} {
	variable myArray
	array set myArray $args
}
gdbm::ar::init "airports.gdbm"
gdbm::ar::fillData ADD "Addis Ababa" MUC "Munich" ZAG "Zagreb"
gdbm::ar::printData
gdbm::ar::close

This example can be immediately used for implementing something
similar to ini-files (where configurable user-data like
window-positions or font-selections are stored).