We can access the text as a list of characters, and access characters starting from position 10, However, this is not a very useful way to work with a text.

We generally think of a text as a sequence of words and punctuation, not characters:. The tags consist of some syntactic information, followed by a plus sign, followed by a conventional part-of-speech tag. Let's strip off the material before the plus sign:. Here's a function that takes a word and a specified amount of context measured in characters , and generates a concordance for that word. We already know that n is the most common tag, so we can set up a default tagger that tags every word as a noun, and see how well it does:.

Let's improve on this by training a unigram tagger:. The sentence tokenizer can be trained and evaluated on other text. The source text from the Floresta Portuguese Treebank contains one sentence per line. We read the text, split it into its lines, and then join these lines together using spaces. Now the information about sentence breaks has been discarded. We split this material into training and testing data:.

NLTK's data collection includes a trained model for Portuguese sentence segmentation, which can be loaded as follows. It is faster to load a trained model than to retrain it. Now we can use these to filter text. Let's find the most frequent words other than stopwords and print them in descending order of frequency:. Type the name of the text or sentence to view it.

Type: 'texts ' or 'sents ' to list the materials. Tudo nessa f mim mesmo. We can automatically generate random text based on a given text, e. If marriage were a good thing, there would be no witnesses. Since , we have been helping people from all over the world to learn the Brazilian language. Almost all of our resources are FREE. Nunes has been creating language resources since Talk to us. Learning Portuguese may be faster than you imagine. Ajoelhou, tem que rezar. Hope is the last one to die.

Hunger is the best seasoning. Necessity makes the frog jump. At night all cats are brown. Our union makes us stronger. Life begins at 40! Amor com amor se paga. Apressado come cru. Dos males, o menor. Muito riso, pouco siso. Love should be paid with love. Appearances are deceiving. Walls have ears. Each head, a different judgement. Each monkey on its own branch. Every one for himself, and God for all.

After the storm comes the easiness. God writes straight by broken lines.

Money doesn't sprout on trees. Better droping than dry. Better preventing than fixing. A bird in the hand is worth two in the bush. It's like exchanging six for half a dozen. While there's life, there's hope. Too much alms, the saint will be suspicious. Children raised, doubled work. The one who left, lost his place. Some bad things come for good.

Much laugher, short wisdom. Don't cry over spilt milk. Don't put the cart before the oxen. O barato sai caro. Tal pai, tal filho. Crime doesn't pay. Has any cat bitten your tongue?

Habits make monks. The sun rises for everybody. A bent stick will never be straight. By its fruits one knows the tree. Pepper in someone else's eyes is refreshing. To eat and itch, all you have to do is begin. The more one has, the more he wants.

He who warns is a true friend. Who keeps quiet, agrees. Who sings throws sadness away. When you get married, you need a house. Who eats the meat, faces the bones. Who tells a tale, tells his own version.

When one disdains, he wants to buy. Nothing venture, nothing gain. He who plays with pigs, will eat with them. Who is in the rain, wants to be wet.

