NLP e Língua Portuguesa

Posted: November 24th, 2009 | Author: sofia | Filed under: curious | Tags: , , | No Comments »

Desde há algum tempo que tenho alguma curiosidade sobre NLP (natural language processing) tanto a nível teórico (como funcionam os algoritmos que usam) como a nivel prático. Não existe por exemplo, um serviço semelhante ao Apture para português. Assim, derivados das minhas pesquisas aqui vão alguns links que achei interessantes:

LXSuite - um conjunto de webservices para a análise linguística de texto desenvolvido pela Universidade Lisboa. Na LxSuite estão todos juntos mas podem ser vistos em separado no LxCenter. Infelizmente, o serviço não tem api e é necessário consentimento para o seu uso.

LXService:  Web Services of Language Technology for Portuguese - artigo a descrever o desenvolvimento dos webservices acima.

Linguateca - todo um conjunto de recursos de NLP para a língua portuguesa. O HAREM - NER for portuguese e respectivo livro está aqui. Eles também disponibilizam vários corpus para português aqui, inclusivé o CETEMPúblico (disponível para download gratuitamente).

Portuguese Language Processing Service - um artigo sobre o desenvolvimento dum conjunto de webservices de NLP para língua portuguesa desenvolvido no Brasil. De acordo com os próprios:

In this paper, we describe F-EXT-WS, a Portuguese Language Processing Service that is now available at the Web. The first version of this service provides Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. All these tools were built with the Entropy Guided Transformation Learning algorithm, a state-of-the-art Machine Learning algorithm for such tasks.

Este artigo parece interessante e pode ser um ponto de partida para outras explorações (ex. ETL/Entropy Guided Transformation Learning - ver Portuguese corpus-based learning using ETL). Fui ver o F-EXT-WS mas é necessário registo e deu erro.

Alguém conhece outros recursos interessantes dentro desta àrea?


Standalone cache according to http requests

Posted: July 27th, 2008 | Author: sofia | Filed under: curious | Tags: | 8 Comments »

Maybe this already exists but since my searches didn’t turn up anything, i thought i’d post this.

You have an app coded more or less rest style. Every post request implies there was a data change (-> cache becomes stale), every get request implies there was no change in the data (-> cache stays fresh). So you know that if a post request was made to domain.com/admin/news, the news cache becomes stale. I won’t go really deep here, in that if you change item 8 of the news table, you might only have 2 stale caches, the one that lists the news and the one that shows item 8 of the news table ( ie. domain.com/news and domain.com/news/8 or domain.com/news/title-of-article) and not every cache belonging to the news group but let’s keep it simple here.

I would like to know if there’s anything out there that parses the apache logs for post requests and if there was a post/put/delete in any url, according to a few configurable rules, it will automatically do a get to the correspondent url. For example, if a post was made to domain.com/admin/news/8 then it would be able to, upon parsing of the apache logs, do a get request to domain.com/news/8, generating the cache for the next user that comes along instead of waiting for the next user to generate a fresh cache - keeping him waiting . It would just increase the cache hit ratio per user. It would of course run as a cron job.

I like this solution because it really keeps the caching code (if cache exists, expire cache, use cache, etc) outside the app, becoming simply another layer, where it really should be.

I really think that this makes sense from a rest perspective so i suspect it’s already out there..

So anyone know of anything? Preferably in php, but python or ruby is ok too.

Thanx :=)