Integrating AD of a NN with an exterior optimization scheme

Salve amici!

I need to evaluate the derivative of an objective function in an external optimization scheme. Part of the model is constructed using NN, thus, through the chain rule, I have a bit of the derivative I can evaluate externally and the other bit using AD over the NN model. I plan to assemble a framework with scipy and tensorflow to this end.

I’m quite new to tensorflow and, honestly, don’t know how to do things efficiently (although I need to). What puzzles me now is how to compute the derivative of the NN model output neurons with respect to the weights and biases (parameters). I can use tape.jacobian, but it creates a list of tensors, which are organized by layers. I need it to be organized per input samples though, such as a matrix:

dy1(x1)/dp1 dy1(x2)/dp1 … dy1(xn)/dp1
dy1(x1)/dp2 dy1(x2)/dp2 … dy1(xn)/dp2

dy1(x1)/dpm dy1(x2)/dpm … dy1(xn)/dpm
dy2(x1)/dp1 dy2(x2)/dp1 … dy2(xn)/dp1
dy2(x1)dp2 dy2(x2)/dp2 … dy2(xn)/dp2

dyl(x1)dpm dyl(x2)/dpm … dyl(xn)/dpm

n: number of input samples; m: number of model trainable variables; l: number of output neurons. Hence, the matrix would be (l*m)xn

Is it possible to do with jacobian? Should I use gradient instead?