Kerberos Issues With Podcast Producer / XGrid on Leopard Server
I ran into several difficulties setting up Podcast Producer in Leopard server. I followed the setup instructions in the manual, but when it came to getting Xgrid up and running, I hit a wall.
Here are the problems that I encountered:
“agent could not determine the expected controller service principal”
The Podcast Producer manual says that Kerberos authentication in Xgrid is necessary (page 26)…
However, by following the directions in the manual, I got an Xgrid agent that could not authenticate to the controller. I reported this problem in detail in the Apple Discussion Forums, but nobody replied.
Although it is likely a rare problem, I wasn’t the only one who had it.
I’m happy to report that I discovered what the “expected controller service principal” was, and how to fix it. The expected controller principal is defined in a text file: /private/etc/xgrid/controller/service-principal. Where else would it be? ;-). For me, it was set to “xgrid/hostname@MYREALM.CA”.
To fix the problem, get your real xgrid service principal from your keytab:
$ sudo klist -k | grep xgrid
3 xgrid/leopardserver.netmojo.ca@MYREALM.CA
3 xgrid/leopardserver.netmojo.ca@MYREALM.CA
3 xgrid/leopardserver.netmojo.ca@MYREALM.CA
And replace whatever is in the service-principal file with the correct principal. In the Server Admin application, change to Xgrid -> Settings -> Agent, and put your FQDN (i.e., leopardserver.netmojo.ca) — or whatever the host portion of your xgrid service principal is — in the “Use a Specific Controller” field. Restart Xgrid.
That alone seems to solve both this problem, and the next one.
“xgridagentd: Error returned by gss_init_sec_context … Unspecified GSS failure”
At some point, I switched from the “expected controller service principal” error to this gss_init_sec/Unspecified GSS failure error. There were also some BEEPErrors (620, 600 and oddly 200) thrown in for good measure.
There was also a GSS minor error, “Server not found in Kerberos database”, which suggests that it was looking up the wrong service principal. So updating the /private/etc/xgrid/controller/service-principal file probably fixed this.
However, you might also want to check that there are Kerberos principals for your podcast producer users: pcastadmin, pcastuser, pcastxgrid (as described on page 24 of the Podcast Producer manual). Run:
# kadmin.local
kadmin.local: listprincs *cast*
pcastadmin@MYREALM.CA
pcastuser@MYREALM.CA
pcastxgrid@MYREALM.CA
These should have been added automatically when you created the users, but if they are not there, you can add them in kadmin.local. For example:
kadmin.local: addprinc -randkey pcastuser@MYREALM.CA
“_xgrid._tcp.local”
The automatic configuration of Xgrid via the “Configure Xgrid Service” button in the Xgrid panel of Server Admin.app sets up the controller to advertise its service via Bonjour, aka mDNS, at _xgrid._tcp.local. This results in these intermittent entries in my logs:
2/6/08 4:45:57 PM Unknown[30] Client application bug: DNSServiceResolver(leopardserver\.netmojo\.ca._xgrid._tcp.local.) active for over two minutes. This places considerable burden on the network.
I still haven’t found a way to prevent this. It looks like its wasting resources, but it doesn’t seem to affect the functioning of Xgrid or Podcast Producer.
“xgridagentd: Warning: agent error opening connection to controller “controller.netmojo.ca” (error = Unable to open: BEEPError 200 (Success))”
This was hard to figure out, but easy to fix. The error message is weird: in the BEEP RFC, reply code 200 means “Success”. What was failing was Kerberos, and therefore Podcast Producer. It came with this:
PcastException: Unable to acquire Kerberos TGT for user pcastxgrid
Error: agent could not obtain Kerberos TGT for realm "NETMOJO.CA"
The problem turned out to be firewall related. The TCP port for Network Time Protocol (NTP) was not open on this box, and so the system time was not getting set from the network. The Kerberos controller was a different machine, and the clocks had grown too far out of sync. Kerberos encrypts credentials using passwords and timestamps, and so if the systems’ clocks are not synchronized, it won’t work. Podcast producer was failing to get a “pre-authorization ticket”, which is a cached credential encrypted with the timestamp on the controller. What solved it was opening the firewall, syncing the system clocks, then rebooting to clear credential caches. I suspect there is a better way of clearing credential caches, but I was in a hurry.

April 3rd, 2008 at 00:15
Hi,
for me this did not solve the problem with “agent could not determine the expected controller service principal”. My xgrid service principal and the one expected are the one and the same. Any ideas?
/jussi
April 3rd, 2008 at 14:30
Did you have to change the/private/etc/xgrid/controller/service-principal file? If so, did you restart Xgrid and the KDC after changing it? Is your actual FQDN equal to the name in that file, and does DNS resolve your IP address (forward and reverse lookups) to that name?
April 4th, 2008 at 00:07
Also, try running:
If the FQDN of your kerberos server isn’t in the prefs:ControllerName field, you can set it with:
Yet another place to look for problems is in the plaintext xml file:
I’m interested to know how it goes. Good luck!
August 7th, 2008 at 05:09
hello
Interesting postings
I have the folowing:
$ sudo klist -k | grep xgrid
4 xgrid/servername.domain.ro@SERVERNAME.DOMAIN.RO
4 xgrid/servername.domain.ro@SERVERNAME.DOMAIN.RO
4 xgrid/servername.domain.ro@SERVERNAME.DOMAIN.RO
3 xgrid@SERVERNAME.DOMAIN.RO
3 xgrid@SERVERNAME.DOMAIN.RO
3 xgrid@SERVERNAME.DOMAIN.RO
2. $ kadmin.local
Couldn’t open log file /var/log/krb5kdc/kadmin.log: Permission denied
Authenticating as principal adminuser/admin@SERVERNAME.DOMAIN.RO with password.
kadmin.local: Permission denied while initializing kadmin.local interface
August 7th, 2008 at 09:31
George, it must be executed with root privileges. Prefix with “sudo”.
August 8th, 2008 at 00:08
thank you for the answer, the thing is that I am still not able to start xgrid with kerberos authentication.
Here is the error from kadmin.local any other ideas would be really appreciated:
$ sudo kadmin.local
Password:
Authenticating as principal root/admin@SERVERNAME.DOMAIN.RO with password.
kadmin.local: No such file or directory while initializing kadmin.local interface
August 8th, 2008 at 11:34
kadmin.local must be executed on the master Kerberos server, as root.
This makes me wonder if it’s possible to run an Xgrid controller on an OD replica server. I’ve tried getting it to work, but so far, no luck.
Another topic that I have yet to fully investigate is the issue of Kerberos slave (replica) servers in OSX Server. This article:
http://www.afp548.com/article.php?story=20060724104018616
… says that each OD replica becomes a Kerberos master server, and changes get replicated back to the other Kerberos masters by some unique-to-OSX mechanism. However, the article was written at the time of OS X Server 10.4. In 10.6, my OD replica doesn’t appear to be a Kerberos master, and I cannot run kadmin.local on it.
June 6th, 2011 at 13:43
Brent:
My PcP initally worked. I could submit jobs, they would get processed etc, Kerberos works for both e-mail and Podcast Capture login.
Now it quit working. The PcP gui interface indicates it can’t find it. The xcontroller activity logs indicate the agent has been contacted with a job to process. The actual xcontroller logs indicate it has shutdown. Something has clearly changed. Also the three
pcastadmin@MYREALM.CA
pcastuser@MYREALM.CA
pcastxgrid@MYREALM.CA
were never in my OD users list. I didn’t try to create them though. The errors that you indicate above I don’t see. It doesn’t mean though that they are there and as soon as I get beyond where I am now they won’t pop up. I believe worse case it to reinstall the software and start from scratch. It will work